Automatic Speech Recognition
Automatic Speech Recognition (ASR) aims to accurately transcribe spoken language into text, driving research into robust and efficient models. Current efforts focus on improving accuracy and robustness through techniques like consistency regularization in Connectionist Temporal Classification (CTC), leveraging pre-trained multilingual models for low-resource languages, and integrating Large Language Models (LLMs) for enhanced contextual understanding and improved handling of diverse accents and speech disorders. These advancements have significant implications for accessibility, enabling applications in diverse fields such as healthcare, education, and human-computer interaction.
Papers
High-precision Voice Search Query Correction via Retrievable Speech-text Embedings
Christopher Li, Gary Wang, Kyle Kastner, Heng Su, Allen Chen, Andrew Rosenberg, Zhehuai Chen, Zelin Wu, Leonid Velikovich, Pat Rondon, Diamantino Caseiro, Petar Aleksic
BS-PLCNet: Band-split Packet Loss Concealment Network with Multi-task Learning Framework and Multi-discriminators
Zihan Zhang, Jiayao Sun, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition
Atsunori Ogawa, Naohiro Tawara, Takatomo Kano, Marc Delcroix
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Anirudh S. Sundar, Chao-Han Huck Yang, David M. Chan, Shalini Ghosh, Venkatesh Ravichandran, Phani Sankar Nidadavolu
FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning
Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha
Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition
Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha
Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models
Atsunori Ogawa, Naohiro Tawara, Marc Delcroix, Shoko Araki
SpokesBiz -- an Open Corpus of Conversational Polish
Piotr Pęzik, Sylwia Karasińska, Anna Cichosz, Łukasz Jałowiecki, Konrad Kaczyński, Małgorzata Krawentek, Karolina Walkusz, Paweł Wilk, Mariusz Kleć, Krzysztof Szklanny, Szymon Marszałkowski
Automated speech audiometry: Can it work using open-source pre-trained Kaldi-NL automatic speech recognition?
Gloria Araiza-Illan, Luke Meyer, Khiet P. Truong, Deniz Baskent
Noise robust distillation of self-supervised speech models via correlation metrics
Fabian Ritter-Gutierrez, Kuan-Po Huang, Dianwen Ng, Jeremy H. M. Wong, Hung-yi Lee, Eng Siong Chng, Nancy F. Chen