Automatic Speech Recognition
Automatic Speech Recognition (ASR) aims to accurately transcribe spoken language into text, driving research into robust and efficient models. Current efforts focus on improving accuracy and robustness through techniques like consistency regularization in Connectionist Temporal Classification (CTC), leveraging pre-trained multilingual models for low-resource languages, and integrating Large Language Models (LLMs) for enhanced contextual understanding and improved handling of diverse accents and speech disorders. These advancements have significant implications for accessibility, enabling applications in diverse fields such as healthcare, education, and human-computer interaction.
Papers
Enhancing ASR for Stuttered Speech with Limited Data Using Detect and Pass
Olabanji Shonibare, Xiaosu Tong, Venkatesh Ravichandran
A two-step approach to leverage contextual data: speech recognition in air-traffic communications
Iuliia Nigmatulina, Juan Zuluaga-Gomez, Amrutha Prasad, Seyyed Saeed Sarfjoo, Petr Motlicek
ASR-Aware End-to-end Neural Diarization
Aparna Khare, Eunjung Han, Yuguang Yang, Andreas Stolcke
Error Correction in ASR using Sequence-to-Sequence Models
Samrat Dutta, Shreyansh Jain, Ayush Maheshwari, Souvik Pal, Ganesh Ramakrishnan, Preethi Jyothi
RescoreBERT: Discriminative Speech Recognition Rescoring with BERT
Liyan Xu, Yile Gu, Jari Kolehmainen, Haidar Khan, Ankur Gandhe, Ariya Rastrow, Andreas Stolcke, Ivan Bulyko
BEA-Base: A Benchmark for ASR of Spontaneous Hungarian
P. Mihajlik, A. Balog, T. E. Gráczi, A. Kohári, B. Tarján, K. Mády
Visualizing Automatic Speech Recognition -- Means for a Better Understanding?
Karla Markert, Romain Parracone, Mykhailo Kulakov, Philip Sperl, Ching-Yu Kao, Konstantin Böttinger
Language Dependencies in Adversarial Attacks on Speech Recognition Systems
Karla Markert, Donika Mirdita, Konstantin Böttinger
Star Temporal Classification: Sequence Classification with Partially Labeled Data
Vineel Pratap, Awni Hannun, Gabriel Synnaeve, Ronan Collobert
Reducing language context confusion for end-to-end code-switching automatic speech recognition
Shuai Zhang, Jiangyan Yi, Zhengkun Tian, Jianhua Tao, Yu Ting Yeung, Liqun Deng
Run-and-back stitch search: novel block synchronous decoding for streaming encoder-decoder ASR
Emiru Tsunoo, Chaitanya Narisetty, Michael Hentschel, Yosuke Kashiwagi, Shinji Watanabe
Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models
Keqi Deng, Zehui Yang, Shinji Watanabe, Yosuke Higuchi, Gaofeng Cheng, Pengyuan Zhang