Automatic Speech Recognition
Automatic Speech Recognition (ASR) aims to accurately transcribe spoken language into text, driving research into robust and efficient models. Current efforts focus on improving accuracy and robustness through techniques like consistency regularization in Connectionist Temporal Classification (CTC), leveraging pre-trained multilingual models for low-resource languages, and integrating Large Language Models (LLMs) for enhanced contextual understanding and improved handling of diverse accents and speech disorders. These advancements have significant implications for accessibility, enabling applications in diverse fields such as healthcare, education, and human-computer interaction.
Papers
Adapting an Unadaptable ASR System
Rao Ma, Mengjie Qian, Mark J. F. Gales, Kate M. Knill
SlothSpeech: Denial-of-service Attack Against Speech Recognition Models
Mirazul Haque, Rutvij Shah, Simin Chen, Berrak Şişman, Cong Liu, Wei Yang
Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts
Dongji Gao, Matthew Wiesner, Hainan Xu, Leibny Paola Garcia, Daniel Povey, Sanjeev Khudanpur
Adaptation and Optimization of Automatic Speech Recognition (ASR) for the Maritime Domain in the Field of VHF Communication
Emin Cagatay Nakilcioglu, Maximilian Reimann, Ole John
Some voices are too common: Building fair speech recognition systems using the Common Voice dataset
Lucas Maison, Yannick Estève
The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech
Phat Do, Matt Coler, Jelske Dijkstra, Esther Klabbers
Towards hate speech detection in low-resource languages: Comparing ASR to acoustic word embeddings on Wolof and Swahili
Christiaan Jacobs, Nathanaël Carraz Rakotonirina, Everlyn Asiko Chimoto, Bruce A. Bassett, Herman Kamper
AfriNames: Most ASR models "butcher" African Names
Tobi Olatunji, Tejumade Afonja, Bonaventure F. P. Dossou, Atnafu Lambebo Tonja, Chris Chinenye Emezue, Amina Mardiyyah Rufai, Sahib Singh
Edit Distance based RL for RNNT decoding
Dongseong Hwang, Changwan Ryu, Khe Chai Sim
VILAS: Exploring the Effects of Vision and Language Context in Automatic Speech Recognition
Ziyi Ni, Minglun Han, Feilong Chen, Linghui Meng, Jing Shi, Pin Lv, Bo Xu
Zero-Shot Automatic Pronunciation Assessment
Hongfu Liu, Mingqian Shi, Ye Wang
Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Huiqiang Jiang, Li Lyna Zhang, Yuang Li, Yu Wu, Shijie Cao, Ting Cao, Yuqing Yang, Jinyu Li, Mao Yang, Lili Qiu
Towards Selection of Text-to-speech Data to Augment ASR Training
Shuo Liu, Leda Sarı, Chunyang Wu, Gil Keren, Yuan Shangguan, Jay Mahadeokar, Ozlem Kalinli
Prospective Validation of Motor-Based Intervention with Automated Mispronunciation Detection of Rhotics in Residual Speech Sound Disorders
Nina R Benway, Jonathan L Preston
STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions
Michel Plüss, Jan Deriu, Yanick Schraner, Claudio Paonessa, Julia Hartmann, Larissa Schmidt, Christian Scheller, Manuela Hürlimann, Tanja Samardžić, Manfred Vogel, Mark Cieliebak
Graph Neural Networks for Contextual ASR with the Tree-Constrained Pointer Generator
Guangzhi Sun, Chao Zhang, Phil Woodland
Building Accurate Low Latency ASR for Streaming Voice Search
Abhinav Goyal, Nikesh Garera
HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition
Florian Mai, Juan Zuluaga-Gomez, Titouan Parcollet, Petr Motlicek
Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning
Xuankai Chang, Brian Yan, Yuya Fujita, Takashi Maekaku, Shinji Watanabe
Can We Trust Explainable AI Methods on ASR? An Evaluation on Phoneme Recognition
Xiaoliang Wu, Peter Bell, Ajitha Rajan