Automatic Speech Recognition
Automatic Speech Recognition (ASR) aims to accurately transcribe spoken language into text, driving research into robust and efficient models. Current efforts focus on improving accuracy and robustness through techniques like consistency regularization in Connectionist Temporal Classification (CTC), leveraging pre-trained multilingual models for low-resource languages, and integrating Large Language Models (LLMs) for enhanced contextual understanding and improved handling of diverse accents and speech disorders. These advancements have significant implications for accessibility, enabling applications in diverse fields such as healthcare, education, and human-computer interaction.
Papers
Global Performance Disparities Between English-Language Accents in Automatic Speech Recognition
Alex DiChristofano, Henry Shuster, Shefali Chandra, Neal Patwari
DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognition
Z. Guo, C. Chen, E. S. Chng
Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer
Cong-Thanh Do, Mohan Li, Rama Doddipatla
Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognition
Peng Shen, Xugang Lu, Hisashi Kawai
Domain Specific Wav2vec 2.0 Fine-tuning For The SE&R 2022 Challenge
Alef Iury Siqueira Ferreira, Gustavo dos Reis Oliveira
When Is TTS Augmentation Through a Pivot Language Useful?
Nathaniel Robinson, Perez Ogayo, Swetha Gangu, David R. Mortensen, Shinji Watanabe
Transfer Learning of wav2vec 2.0 for Automatic Lyric Transcription
Longshen Ou, Xiangming Gu, Ye Wang
Improving Data Driven Inverse Text Normalization using Data Augmentation
Laxmi Pandey, Debjyoti Paul, Pooja Chitkara, Yutong Pang, Xuedong Zhang, Kjell Schubert, Mark Chou, Shu Liu, Yatharth Saraf
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding
Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale
Gopinath Chennupati, Milind Rao, Gurpreet Chadha, Aaron Eakin, Anirudh Raju, Gautam Tiwari, Anit Kumar Sahu, Ariya Rastrow, Jasha Droppo, Andy Oberlin, Buddha Nandanoor, Prahalad Venkataramanan, Zheng Wu, Pankaj Sitpure