Speech Recognition System
Speech recognition systems aim to accurately transcribe spoken language into text, a crucial task with broad applications. Current research focuses on improving robustness and accuracy, particularly in challenging conditions like noisy environments, multiple speakers, and disfluent speech, often employing deep learning models such as transformers and recurrent neural networks, along with techniques like multi-task learning and data augmentation. These advancements are vital for enhancing accessibility for individuals with speech impairments, improving human-computer interaction in various domains, and enabling more sophisticated natural language processing applications. Ongoing efforts also address biases in existing systems and explore multimodal approaches integrating visual information to improve performance.
Papers
Personalized Predictive ASR for Latency Reduction in Voice Assistants
Andreas Schwarz, Di He, Maarten Van Segbroeck, Mohammed Hethnawi, Ariya Rastrow
Cross-lingual Knowledge Transfer and Iterative Pseudo-labeling for Low-Resource Speech Recognition with Transducers
Jan Silovsky, Liuhui Deng, Arturo Argueta, Tresi Arvizo, Roger Hsiao, Sasha Kuznietsov, Yiu-Chang Lin, Xiaoqiang Xiao, Yuanyuan Zhang