Automatic Speech Recognition
Automatic Speech Recognition (ASR) aims to accurately transcribe spoken language into text, driving research into robust and efficient models. Current efforts focus on improving accuracy and robustness through techniques like consistency regularization in Connectionist Temporal Classification (CTC), leveraging pre-trained multilingual models for low-resource languages, and integrating Large Language Models (LLMs) for enhanced contextual understanding and improved handling of diverse accents and speech disorders. These advancements have significant implications for accessibility, enabling applications in diverse fields such as healthcare, education, and human-computer interaction.
Papers
Analysis of EEG frequency bands for Envisioned Speech Recognition
Ayush Tripathi
Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing
Xiaodong Cui, George Saon, Tohru Nagano, Masayuki Suzuki, Takashi Fukuda, Brian Kingsbury, Gakuto Kurata
CMGAN: Conformer-based Metric GAN for Speech Enhancement
Ruizhe Cao, Sherif Abdulatif, Bin Yang
Finnish Parliament ASR corpus - Analysis, benchmarks and statistics
Anja Virkkunen, Aku Rouhe, Nhan Phan, Mikko Kurimo
Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition
Yuchen Hu, Nana Hou, Chen Chen, Eng Siong Chng
Training speaker recognition systems with limited data
Nik Vaessen, David A. van Leeuwen
Complex Frequency Domain Linear Prediction: A Tool to Compute Modulation Spectrum of Speech
Samik Sadhu, Hynek Hermansky
Computing Optimal Location of Microphone for Improved Speech Recognition
Karan Nathwani, Bhavya Dixit, Sunil Kumar Kopparapu
Lahjoita puhetta -- a large-scale corpus of spoken Finnish with some benchmarks
Anssi Moisio, Dejan Porjazovski, Aku Rouhe, Yaroslav Getman, Anja Virkkunen, Tamás Grósz, Krister Lindén, Mikko Kurimo
Automatic Speech Recognition for Speech Assessment of Persian Preschool Children
Amirhossein Abaskohi, Fatemeh Mortazavi, Hadi Moradi
Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion
Xintao Zhao, Feng Liu, Changhe Song, Zhiyong Wu, Shiyin Kang, Deyi Tuo, Helen Meng