Automatic Speech Recognition Model
Automatic speech recognition (ASR) models aim to accurately convert spoken language into text, a crucial task with broad applications. Current research emphasizes improving ASR performance in challenging scenarios, such as low-resource languages, accented speech, and noisy environments, often leveraging large language models (LLMs) and techniques like parameter-efficient fine-tuning and self-supervised learning. These advancements are driven by the need for more robust, accurate, and equitable ASR systems across diverse languages and speaker demographics, impacting fields ranging from healthcare to legal proceedings.
Papers
AfriHuBERT: A self-supervised speech representation model for African languages
Jesujoba O. Alabi, Xuechen Liu, Dietrich Klakow, Junichi Yamagishi
HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
Bingshen Mu, Kun Wei, Qijie Shao, Yong Xu, Lei Xie
AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
Rong Gong, Hongfei Xue, Lezhi Wang, Xin Xu, Qisheng Li, Lei Xie, Hui Bu, Shaomei Wu, Jiaming Zhou, Yong Qin, Binbin Zhang, Jun Du, Jia Bin, Ming Li
Reading Miscue Detection in Primary School through Automatic Speech Recognition
Lingyun Gao, Cristian Tejedor-Garcia, Helmer Strik, Catia Cucchiarini