ASR Model
Automatic speech recognition (ASR) models aim to accurately transcribe spoken language into text, a task crucial for numerous applications. Current research emphasizes improving model robustness across diverse accents, languages, and noisy environments, often leveraging transformer-based architectures like Wav2Vec 2.0 and Conformer, and incorporating visual information for improved accuracy. Significant efforts focus on addressing biases in ASR models, enhancing efficiency through knowledge distillation and self-supervised learning, and developing methods for low-resource languages. These advancements are driving progress in various fields, including accessibility technologies, human-computer interaction, and language documentation.
Papers
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings
Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka
Vakyansh: ASR Toolkit for Low Resource Indic languages
Harveen Singh Chadha, Anirudh Gupta, Priyanshi Shah, Neeraj Chhimwal, Ankur Dhuriya, Rishabh Gaur, Vivek Raghavan
Federated Domain Adaptation for ASR with Full Self-Supervision
Junteng Jia, Jay Mahadeokar, Weiyi Zheng, Yuan Shangguan, Ozlem Kalinli, Frank Seide
Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition
Piotr Żelasko, Siyuan Feng, Laureano Moro Velazquez, Ali Abavisani, Saurabhchand Bhati, Odette Scharenborg, Mark Hasegawa-Johnson, Najim Dehak
Internal Language Model Estimation Through Explicit Context Vector Learning for Attention-based Encoder-decoder ASR
Yufei Liu, Rao Ma, Haihua Xu, Yi He, Zejun Ma, Weibin Zhang