Monolingual Automatic Speech Recognition

Monolingual automatic speech recognition (ASR) focuses on accurately transcribing speech in a single language, aiming to improve accuracy and efficiency compared to multilingual systems. Current research emphasizes refining model architectures like connectionist temporal classification (CTC) and transformers, often incorporating techniques such as k-nearest neighbors (kNN) and gated datastores to enhance performance, particularly in challenging scenarios like code-switching. These advancements are significant for improving the accessibility and usability of speech technology, impacting fields ranging from cultural heritage preservation to efficient transcription services.

Papers