Speech Recognition System
Speech recognition systems aim to accurately transcribe spoken language into text, a crucial task with broad applications. Current research focuses on improving robustness and accuracy, particularly in challenging conditions like noisy environments, multiple speakers, and disfluent speech, often employing deep learning models such as transformers and recurrent neural networks, along with techniques like multi-task learning and data augmentation. These advancements are vital for enhancing accessibility for individuals with speech impairments, improving human-computer interaction in various domains, and enabling more sophisticated natural language processing applications. Ongoing efforts also address biases in existing systems and explore multimodal approaches integrating visual information to improve performance.
Papers
Identifying depression-related topics in smartphone-collected free-response speech recordings using an automatic speech recognition system and a deep learning topic model
Yuezhou Zhang, Amos A Folarin, Judith Dineley, Pauline Conde, Valeria de Angel, Shaoxiong Sun, Yatharth Ranjan, Zulqarnain Rashid, Callum Stewart, Petroula Laiou, Heet Sankesara, Linglong Qian, Faith Matcham, Katie M White, Carolin Oetzmann, Femke Lamers, Sara Siddi, Sara Simblett, Björn W. Schuller, Srinivasan Vairavan, Til Wykes, Josep Maria Haro, Brenda WJH Penninx, Vaibhav A Narayan, Matthew Hotopf, Richard JB Dobson, Nicholas Cummins, RADAR-CNS consortium
Convoifilter: A case study of doing cocktail party speech recognition
Thai-Binh Nguyen, Alexander Waibel
Personalized Predictive ASR for Latency Reduction in Voice Assistants
Andreas Schwarz, Di He, Maarten Van Segbroeck, Mohammed Hethnawi, Ariya Rastrow
Cross-lingual Knowledge Transfer and Iterative Pseudo-labeling for Low-Resource Speech Recognition with Transducers
Jan Silovsky, Liuhui Deng, Arturo Argueta, Tresi Arvizo, Roger Hsiao, Sasha Kuznietsov, Yiu-Chang Lin, Xiaoqiang Xiao, Yuanyuan Zhang