Open Vocabulary Keyword Spotting

Open-vocabulary keyword spotting (KWS) aims to detect user-defined keywords in speech, overcoming limitations of traditional KWS systems restricted to pre-defined vocabularies. Current research focuses on developing efficient and robust models, employing architectures like connectionist temporal classification (CTC), attention-based mechanisms, and multi-pass approaches, often incorporating text-based keyword enrollment and few-shot learning techniques to handle limited training data. These advancements are significant for improving personalized smart device interactions and enhancing automatic speech recognition systems, particularly for recognizing rare or low-resource language entities.

Papers