ASR Model
Automatic speech recognition (ASR) models aim to accurately transcribe spoken language into text, a task crucial for numerous applications. Current research emphasizes improving model robustness across diverse accents, languages, and noisy environments, often leveraging transformer-based architectures like Wav2Vec 2.0 and Conformer, and incorporating visual information for improved accuracy. Significant efforts focus on addressing biases in ASR models, enhancing efficiency through knowledge distillation and self-supervised learning, and developing methods for low-resource languages. These advancements are driving progress in various fields, including accessibility technologies, human-computer interaction, and language documentation.
Papers
TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty
Xingchen Song, Di Wu, Zhiyong Wu, Binbin Zhang, Yuekai Zhang, Zhendong Peng, Wenpeng Li, Fuping Pan, Changbao Zhu
Speech-text based multi-modal training with bidirectional attention for improved speech recognition
Yuhang Yang, Haihua Xu, Hao Huang, Eng Siong Chng, Sheng Li
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings
Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka
Vakyansh: ASR Toolkit for Low Resource Indic languages
Harveen Singh Chadha, Anirudh Gupta, Priyanshi Shah, Neeraj Chhimwal, Ankur Dhuriya, Rishabh Gaur, Vivek Raghavan
Federated Domain Adaptation for ASR with Full Self-Supervision
Junteng Jia, Jay Mahadeokar, Weiyi Zheng, Yuan Shangguan, Ozlem Kalinli, Frank Seide