Domain Automatic Speech Recognition

Domain-automatic speech recognition (ASR) aims to build robust speech recognition systems capable of accurately transcribing speech across diverse domains and languages, overcoming limitations of traditional models trained on single, homogeneous datasets. Current research emphasizes developing multi-domain models using techniques like self-supervised pre-training (e.g., Wav2Vec 2.0), incorporating language models for improved accuracy, and employing strategies like mixture-of-experts to handle domain shifts effectively. This work is crucial for improving the accessibility and reliability of speech technology, particularly in low-resource settings and applications requiring high accuracy across varied speech styles and acoustic conditions.

Papers