Speech Recognition

Speech recognition (ASR) aims to automatically transcribe spoken language into text, with current research heavily focused on improving accuracy and robustness across diverse conditions. This involves exploring various model architectures, including transformers, conformers, and large language models (LLMs), often incorporating techniques like connectionist temporal classification (CTC), attention mechanisms, and multimodal integration (audio-visual). Significant efforts are also dedicated to addressing challenges in low-resource languages and noisy environments, as well as enhancing accessibility for individuals with speech impairments. Advances in ASR have broad implications for numerous applications, from virtual assistants and transcription services to improving accessibility for people with disabilities and facilitating cross-lingual communication.

Papers