Spoken Term

Spoken term detection (STD) focuses on identifying specific words or phrases within audio recordings, a crucial task in speech technology and related fields. Current research emphasizes improving STD accuracy and efficiency using deep learning models, particularly transformer-based architectures and recurrent neural networks like LSTMs, often incorporating techniques like contrastive learning and multi-task training to leverage unlabeled data and improve robustness. These advancements aim to reduce reliance on large, manually labeled datasets and enhance performance across diverse languages and acoustic conditions, impacting applications such as keyword spotting, information retrieval from audio archives, and human-computer interaction.

Papers