Speech Recognizers

Automatic speech recognition (ASR) aims to accurately transcribe spoken language into text, with current research focusing on improving accuracy and efficiency across diverse audio conditions and applications. This involves refining existing models like connectionist temporal classification (CTC) and exploring novel approaches such as leveraging large language models (LLMs) for rescoring and knowledge distillation to enhance accuracy and reduce word error rates. Improvements in ASR have significant implications for various fields, including law enforcement (analyzing police radio communications), voice assistants (mitigating biases), and broader applications requiring real-time transcription and understanding of spoken language.

Papers