Automatic Speech Recognition
Automatic Speech Recognition (ASR) aims to accurately transcribe spoken language into text, driving research into robust and efficient models. Current efforts focus on improving accuracy and robustness through techniques like consistency regularization in Connectionist Temporal Classification (CTC), leveraging pre-trained multilingual models for low-resource languages, and integrating Large Language Models (LLMs) for enhanced contextual understanding and improved handling of diverse accents and speech disorders. These advancements have significant implications for accessibility, enabling applications in diverse fields such as healthcare, education, and human-computer interaction.
Papers
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Yu Zhang, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen, Bo Li, Vera Axelrod, Gary Wang, Zhong Meng, Ke Hu, Andrew Rosenberg, Rohit Prabhavalkar, Daniel S. Park, Parisa Haghani, Jason Riesa, Ginger Perng, Hagen Soltau, Trevor Strohman, Bhuvana Ramabhadran, Tara Sainath, Pedro Moreno, Chung-Cheng Chiu, Johan Schalkwyk, Françoise Beaufays, Yonghui Wu
Leveraging Large Text Corpora for End-to-End Speech Summarization
Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Atsunori Ogawa, Marc Delcroix, Ryo Masumura
Synthetic Cross-accent Data Augmentation for Automatic Speech Recognition
Philipp Klumpp, Pooja Chitkara, Leda Sarı, Prashant Serai, Jilong Wu, Irina-Elena Veliche, Rongqing Huang, Qing He
Towards domain generalisation in ASR with elitist sampling and ensemble knowledge distillation
Rehan Ahmad, Md Asif Jalal, Muhammad Umar Farooq, Anna Ollerenshaw, Thomas Hain
N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space
Rao Ma, Mark J. F. Gales, Kate M. Knill, Mengjie Qian
A Comparison of Speech Data Augmentation Methods Using S3PRL Toolkit
Mina Huh, Ruchira Ray, Corey Karnei
Multimodal Speech Recognition for Language-Guided Embodied Agents
Allen Chang, Xiaoyuan Zhu, Aarav Monga, Seoho Ahn, Tejas Srinivasan, Jesse Thomason
Explanations for Automatic Speech Recognition
Xiaoliang Wu, Peter Bell, Ajitha Rajan