Keyword Localisation
Visually prompted keyword localization (VPKL) aims to identify the location of a keyword within a speech utterance using only an image depicting that keyword, bypassing the need for transcriptions. Current research focuses on developing visually grounded speech models, often employing attention mechanisms or other localization strategies, to achieve this task, particularly for low-resource languages where transcribed data is scarce. This research is significant for enabling speech processing in languages lacking extensive linguistic resources, with potential applications in language documentation and cross-lingual information retrieval.
Papers
September 9, 2024
September 3, 2024
February 1, 2023
October 12, 2022
October 10, 2022
February 14, 2022