Automatic Speech Recognition
Automatic Speech Recognition (ASR) aims to accurately transcribe spoken language into text, driving research into robust and efficient models. Current efforts focus on improving accuracy and robustness through techniques like consistency regularization in Connectionist Temporal Classification (CTC), leveraging pre-trained multilingual models for low-resource languages, and integrating Large Language Models (LLMs) for enhanced contextual understanding and improved handling of diverse accents and speech disorders. These advancements have significant implications for accessibility, enabling applications in diverse fields such as healthcare, education, and human-computer interaction.
Papers
SGGNet$^2$: Speech-Scene Graph Grounding Network for Speech-guided Navigation
Dohyun Kim, Yeseung Kim, Jaehwi Jang, Minjae Song, Woojin Choi, Daehyung Park
Replay to Remember: Continual Layer-Specific Fine-tuning for German Speech Recognition
Theresa Pekarek Rosin, Stefan Wermter
Ed-Fed: A generic federated learning framework with resource-aware client selection for edge devices
Zitha Sasindran, Harsha Yelchuri, T. V. Prabhakar
Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework
Eliya Segev, Maya Alroy, Ronen Katsir, Noam Wies, Ayana Shenhav, Yael Ben-Oren, David Zar, Oren Tadmor, Jacob Bitterman, Amnon Shashua, Tal Rosenwein
Boosting Norwegian Automatic Speech Recognition
Javier de la Rosa, Rolv-Arild Braaten, Per Egil Kummervold, Freddy Wetjen, Svein Arne Brygfjeld
Transcribing Educational Videos Using Whisper: A preliminary study on using AI for transcribing educational videos
Ashwin Rao
Pretraining Conformer with ASR or ASV for Anti-Spoofing Countermeasure
Yikang Wang, Hiromitsu Nishizaki, Ming Li