Automatic Speech Recognition
Automatic Speech Recognition (ASR) aims to accurately transcribe spoken language into text, driving research into robust and efficient models. Current efforts focus on improving accuracy and robustness through techniques like consistency regularization in Connectionist Temporal Classification (CTC), leveraging pre-trained multilingual models for low-resource languages, and integrating Large Language Models (LLMs) for enhanced contextual understanding and improved handling of diverse accents and speech disorders. These advancements have significant implications for accessibility, enabling applications in diverse fields such as healthcare, education, and human-computer interaction.
Papers
On Building Spoken Language Understanding Systems for Low Resourced Languages
Akshat Gupta
An Investigation on Applying Acoustic Feature Conversion to ASR of Adult and Child Speech
Wei Liu, Jingyu Li, Tan Lee
FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech
Alexis Conneau, Min Ma, Simran Khanuja, Yu Zhang, Vera Axelrod, Siddharth Dalmia, Jason Riesa, Clara Rivera, Ankur Bapna
Content-Context Factorized Representations for Automated Speech Recognition
David M. Chan, Shalini Ghosh
Automatic Spoken Language Identification using a Time-Delay Neural Network
Benjamin Kepecs, Homayoon Beigi
Insights on Neural Representations for End-to-End Speech Recognition
Anna Ollerenshaw, Md Asif Jalal, Thomas Hain
Deploying self-supervised learning in the wild for hybrid automatic speech recognition
Mostafa Karimi, Changliang Liu, Kenichi Kumatani, Yao Qian, Tianyu Wu, Jian Wu
Streaming Noise Context Aware Enhancement For Automatic Speech Recognition in Multi-Talker Environments
Joe Caroselli, Arun Narayanan, Yiteng Huang