Automatic Speech Recognition
Automatic Speech Recognition (ASR) aims to accurately transcribe spoken language into text, driving research into robust and efficient models. Current efforts focus on improving accuracy and robustness through techniques like consistency regularization in Connectionist Temporal Classification (CTC), leveraging pre-trained multilingual models for low-resource languages, and integrating Large Language Models (LLMs) for enhanced contextual understanding and improved handling of diverse accents and speech disorders. These advancements have significant implications for accessibility, enabling applications in diverse fields such as healthcare, education, and human-computer interaction.
Papers
DANCER: Entity Description Augmented Named Entity Corrector for Automatic Speech Recognition
Yi-Cheng Wang, Hsin-Wei Wang, Bi-Cheng Yan, Chi-Han Lin, Berlin Chen
Extracting Biomedical Entities from Noisy Audio Transcripts
Nima Ebadi, Kellen Morgan, Adrian Tan, Billy Linares, Sheri Osborn, Emma Majors, Jeremy Davis, Anthony Rios
Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition
Wenjing Zhu, Sining Sun, Changhao Shan, Peng Fan, Qing Yang
SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation
Jiayu Du, Jinpeng Li, Guoguo Chen, Wei-Qiang Zhang
Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children
Taekyung Ahn, Yeonjung Hong, Younggon Im, Do Hyung Kim, Dayoung Kang, Joo Won Jeong, Jae Won Kim, Min Jung Kim, Ah-ra Cho, Dae-Hyun Jang, Hosung Nam
What has LeBenchmark Learnt about French Syntax?
Zdravko Dugonjić, Adrien Pupier, Benjamin Lecouteux, Maximin Coavoux
SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
Zhiyun Fan, Linhao Dong, Jun Zhang, Lu Lu, Zejun Ma
Language and Speech Technology for Central Kurdish Varieties
Sina Ahmadi, Daban Q. Jaff, Md Mahfuz Ibn Alam, Antonios Anastasopoulos
JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition
Chang Sun, Hong Yang, Bo Qin