Speech Recognition
Speech recognition (ASR) aims to automatically transcribe spoken language into text, with current research heavily focused on improving accuracy and robustness across diverse conditions. This involves exploring various model architectures, including transformers, conformers, and large language models (LLMs), often incorporating techniques like connectionist temporal classification (CTC), attention mechanisms, and multimodal integration (audio-visual). Significant efforts are also dedicated to addressing challenges in low-resource languages and noisy environments, as well as enhancing accessibility for individuals with speech impairments. Advances in ASR have broad implications for numerous applications, from virtual assistants and transcription services to improving accessibility for people with disabilities and facilitating cross-lingual communication.
Papers
ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams
Srija Anand, Praveen Srinivasa Varadhan, Mehak Singal, Mitesh M. Khapra
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Yifan Peng, Krishna C. Puvvada, Zhehuai Chen, Piotr Zelasko, He Huang, Kunal Dhawan, Ke Hu, Shinji Watanabe, Jagadeesh Balam, Boris Ginsburg
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
Wenxi Chen, Ziyang Ma, Xiquan Li, Xuenan Xu, Yuzhe Liang, Zhisheng Zheng, Kai Yu, Xie Chen
Automatic Speech Recognition with BERT and CTC Transformers: A Review
Noussaiba Djeffal, Hamza Kheddar, Djamel Addou, Ahmed Cherif Mazari, Yassine Himeur
Automatic Screening for Children with Speech Disorder using Automatic Speech Recognition: Opportunities and Challenges
Dancheng Liu, Jason Yang, Ishan Albrecht-Buehler, Helen Qin, Sophie Li, Yuting Hu, Amir Nassereldine, Jinjun Xiong
CR-CTC: Consistency regularization on CTC for improved speech recognition
Zengwei Yao, Wei Kang, Xiaoyu Yang, Fangjun Kuang, Liyong Guo, Han Zhu, Zengrui Jin, Zhaoqing Li, Long Lin, Daniel Povey