Automatic Speech Recognition
Automatic Speech Recognition (ASR) aims to accurately transcribe spoken language into text, driving research into robust and efficient models. Current efforts focus on improving accuracy and robustness through techniques like consistency regularization in Connectionist Temporal Classification (CTC), leveraging pre-trained multilingual models for low-resource languages, and integrating Large Language Models (LLMs) for enhanced contextual understanding and improved handling of diverse accents and speech disorders. These advancements have significant implications for accessibility, enabling applications in diverse fields such as healthcare, education, and human-computer interaction.
Papers
Combining X-Vectors and Bayesian Batch Active Learning: Two-Stage Active Learning Pipeline for Speech Recognition
Ognjen Kundacina, Vladimir Vincan, Dragisa Miskovic
Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
Xuelong Geng, Tianyi Xu, Kun Wei, Bingshen Mu, Hongfei Xue, He Wang, Yangze Li, Pengcheng Guo, Yuhang Dai, Longhao Li, Mingchen Shao, Lei Xie
Sequence-to-sequence models in peer-to-peer learning: A practical application
Robert Šajina, Ivo Ipšić
Improving Membership Inference in ASR Model Auditing with Perturbed Loss Features
Francisco Teixeira, Karla Pizzi, Raphael Olivier, Alberto Abad, Bhiksha Raj, Isabel Trancoso
Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment
Aditya Chakravarty
Efficient Compression of Multitask Multilingual Speech Models
Thomas Palmeira Ferraz
Automatic Speech Recognition System-Independent Word Error Rate Estimation
Chanho Park, Mingjie Chen, Thomas Hain
Developing Acoustic Models for Automatic Speech Recognition in Swedish
Giampiero Salvi
U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF
Xingchen Song, Di Wu, Binbin Zhang, Dinghao Zhou, Zhendong Peng, Bo Dang, Fuping Pan, Chao Yang