ASR Model
Automatic speech recognition (ASR) models aim to accurately transcribe spoken language into text, a task crucial for numerous applications. Current research emphasizes improving model robustness across diverse accents, languages, and noisy environments, often leveraging transformer-based architectures like Wav2Vec 2.0 and Conformer, and incorporating visual information for improved accuracy. Significant efforts focus on addressing biases in ASR models, enhancing efficiency through knowledge distillation and self-supervised learning, and developing methods for low-resource languages. These advancements are driving progress in various fields, including accessibility technologies, human-computer interaction, and language documentation.
Papers
Fotheidil: an Automatic Transcription System for the Irish Language
Liam Lonergan, Ibon Saratxaga, John Sloan, Oscar Maharog, Mengjie Qian, Neasa Ní Chiaráin, Christer Gobl, Ailbhe Ní Chasaide
Whisper Turns Stronger: Augmenting Wav2Vec 2.0 for Superior ASR in Low-Resource Languages
Or Haim Anidjar, Revital Marbel, Roi Yozevitch