Automatic Speech Recognition
Automatic Speech Recognition (ASR) aims to accurately transcribe spoken language into text, driving research into robust and efficient models. Current efforts focus on improving accuracy and robustness through techniques like consistency regularization in Connectionist Temporal Classification (CTC), leveraging pre-trained multilingual models for low-resource languages, and integrating Large Language Models (LLMs) for enhanced contextual understanding and improved handling of diverse accents and speech disorders. These advancements have significant implications for accessibility, enabling applications in diverse fields such as healthcare, education, and human-computer interaction.
Papers
AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition
Ju Lin, Niko Moritz, Yiteng Huang, Ruiming Xie, Ming Sun, Christian Fuegen, Frank Seide
Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks
Yichao Du, Zhirui Zhang, Linan Yue, Xu Huang, Yuqing Zhang, Tong Xu, Linli Xu, Enhong Chen
SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition
Hao Wang, Shuhei Kurita, Shuichiro Shimizu, Daisuke Kawahara
NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription
Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Pe`er, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda, Xiaofei Wang, Shalev Shaer, Stav Yagev, Yossi Asher, Sunit Sivasankaran, Yifan Gong, Min Tang, Huaming Wang, Eyal Krupka
Improving ASR Contextual Biasing with Guided Attention
Jiyang Tang, Kwangyoun Kim, Suwon Shon, Felix Wu, Prashant Sridhar, Shinji Watanabe
Transcending Controlled Environments Assessing the Transferability of ASRRobust NLU Models to Real-World Applications
Hania Khan, Aleena Fatima Khalid, Zaryab Hassan
XLS-R Deep Learning Model for Multilingual ASR on Low- Resource Languages: Indonesian, Javanese, and Sundanese
Panji Arisaputra, Alif Tri Handoyo, Amalia Zahra
LCB-net: Long-Context Biasing for Audio-Visual Speech Recognition
Fan Yu, Haoxu Wang, Xian Shi, Shiliang Zhang
UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction
Jiaxin Guo, Minghan Wang, Xiaosong Qiao, Daimeng Wei, Hengchao Shang, Zongyao Li, Zhengzhe Yu, Yinglu Li, Chang Su, Min Zhang, Shimin Tao, Hao Yang
End to end Hindi to English speech conversion using Bark, mBART and a finetuned XLSR Wav2Vec2
Aniket Tathe, Anand Kamble, Suyash Kumbharkar, Atharva Bhandare, Anirban C. Mitra