Automatic Speech Recognition
Automatic Speech Recognition (ASR) aims to accurately transcribe spoken language into text, driving research into robust and efficient models. Current efforts focus on improving accuracy and robustness through techniques like consistency regularization in Connectionist Temporal Classification (CTC), leveraging pre-trained multilingual models for low-resource languages, and integrating Large Language Models (LLMs) for enhanced contextual understanding and improved handling of diverse accents and speech disorders. These advancements have significant implications for accessibility, enabling applications in diverse fields such as healthcare, education, and human-computer interaction.
Papers
PhasePerturbation: Speech Data Augmentation via Phase Perturbation for Automatic Speech Recognition
Chengxi Lei, Satwinder Singh, Feng Hou, Xiaoyun Jia, Ruili Wang
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
Shaojin Ding, David Qiu, David Rim, Yanzhang He, Oleg Rybakov, Bo Li, Rohit Prabhavalkar, Weiran Wang, Tara N. Sainath, Zhonglin Han, Jian Li, Amir Yazdanbakhsh, Shivani Agrawal
Extending Whisper with prompt tuning to target-speaker ASR
Hao Ma, Zhiyuan Peng, Mingjie Shao, Jing Li, Ju Liu
FAT-HuBERT: Front-end Adaptive Training of Hidden-unit BERT for Distortion-Invariant Robust Speech Recognition
Dongning Yang, Wei Wang, Yanmin Qian
End-to-end Joint Punctuated and Normalized ASR with a Limited Amount of Punctuated Training Data
Can Cui (MULTISPEECH), Imran Ahamad Sheikh, Mostafa Sadeghi (MULTISPEECH), Emmanuel Vincent (MULTISPEECH)