Automatic Speech Recognition Performance

Automatic speech recognition (ASR) aims to accurately transcribe spoken language into text, a crucial task with broad applications. Current research focuses on improving ASR robustness across diverse speech characteristics (e.g., child speech, accented speech, whispered speech) and noisy environments, often leveraging advanced deep learning architectures like Conformers, Transformers, and self-supervised models (e.g., Wav2Vec 2.0, HuBERT, Whisper). These efforts involve developing new training strategies, data augmentation techniques, and multimodal approaches incorporating visual information to enhance accuracy and address biases present in existing models. The resulting advancements have significant implications for various fields, including healthcare, education, and accessibility technologies.

Papers