Keyword Localisation

Visually prompted keyword localization (VPKL) aims to identify the location of a keyword within a speech utterance using only an image depicting that keyword, bypassing the need for transcriptions. Current research focuses on developing visually grounded speech models, often employing attention mechanisms or other localization strategies, to achieve this task, particularly for low-resource languages where transcribed data is scarce. This research is significant for enabling speech processing in languages lacking extensive linguistic resources, with potential applications in language documentation and cross-lingual information retrieval.

Papers