Automatic Speech Recognition
Automatic Speech Recognition (ASR) aims to accurately transcribe spoken language into text, driving research into robust and efficient models. Current efforts focus on improving accuracy and robustness through techniques like consistency regularization in Connectionist Temporal Classification (CTC), leveraging pre-trained multilingual models for low-resource languages, and integrating Large Language Models (LLMs) for enhanced contextual understanding and improved handling of diverse accents and speech disorders. These advancements have significant implications for accessibility, enabling applications in diverse fields such as healthcare, education, and human-computer interaction.
Papers
Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction
Zehai Tu, Ning Ma, Jon Barker
Exploiting Hidden Representations from a DNN-based Speech Recogniser for Speech Intelligibility Prediction in Hearing-impaired Listeners
Zehai Tu, Ning Ma, Jon Barker
Auditory-Based Data Augmentation for End-to-End Automatic Speech Recognition
Zehai Tu, Jack Deadman, Ning Ma, Jon Barker
MAESTRO: Matched Speech Text Representations through Modality Matching
Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro Moreno, Ankur Bapna, Heiga Zen
Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-trained DNN-HMM-Based Acoustic-Phonetic Model
Nick J. C. Wang, Lu Wang, Yandan Sun, Haimei Kang, Dejun Zhang
Speech Pre-training with Acoustic Piece
Shuo Ren, Shujie Liu, Yu Wu, Long Zhou, Furu Wei
3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition
Zhao You, Shulin Feng, Dan Su, Dong Yu
Towards End-to-end Unsupervised Speech Recognition
Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation
Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel Lopez-Francisco, Jonathan D. Amith, Shinji Watanabe
Hear No Evil: Towards Adversarial Robustness of Automatic Speech Recognition via Multi-Task Learning
Nilaksh Das, Duen Horng Chau
Deliberation Model for On-Device Spoken Language Understanding
Duc Le, Akshat Shrivastava, Paden Tomasello, Suyoun Kim, Aleksandr Livshits, Ozlem Kalinli, Michael L. Seltzer
A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems
Marcely Zanon Boito, Laurent Besacier, Natalia Tomashenko, Yannick Estève
End-to-end model for named entity recognition from speech without paired training data
Salima Mdhaffar, Jarod Duret, Titouan Parcollet, Yannick Estève
Fast Real-time Personalized Speech Enhancement: End-to-End Enhancement Network (E3Net) and Knowledge Distillation
Manthan Thakker, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
Xuankai Chang, Takashi Maekaku, Yuya Fujita, Shinji Watanabe
Zero-Shot Cross-lingual Aphasia Detection using Automatic Speech Recognition
Gerasimos Chatzoudis, Manos Plitsis, Spyridoula Stamouli, Athanasia-Lida Dimou, Athanasios Katsamanis, Vassilis Katsouros
PriMock57: A Dataset Of Primary Care Mock Consultations
Alex Papadopoulos Korfiatis, Francesco Moramarco, Radmila Sarac, Aleksandar Savkov
Text-To-Speech Data Augmentation for Low Resource Speech Recognition
Rodolfo Zevallos
Alternate Intermediate Conditioning with Syllable-level and Character-level Targets for Japanese ASR
Yusuke Fujita, Tatsuya Komatsu, Yusuke Kida