Domain Automatic Speech Recognition
Domain-automatic speech recognition (ASR) aims to build robust speech recognition systems capable of accurately transcribing speech across diverse domains and languages, overcoming limitations of traditional models trained on single, homogeneous datasets. Current research emphasizes developing multi-domain models using techniques like self-supervised pre-training (e.g., Wav2Vec 2.0), incorporating language models for improved accuracy, and employing strategies like mixture-of-experts to handle domain shifts effectively. This work is crucial for improving the accessibility and reliability of speech technology, particularly in low-resource settings and applications requiring high accuracy across varied speech styles and acoustic conditions.
Papers
ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition
Sanchit Gandhi, Patrick von Platen, Alexander M. Rush
Investigating self-supervised, weakly supervised and fully supervised training approaches for multi-domain automatic speech recognition: a study on Bangladeshi Bangla
Ahnaf Mozib Samin, M. Humayon Kobir, Md. Mushtaq Shahriyar Rafee, M. Firoz Ahmed, Mehedi Hasan, Partha Ghosh, Shafkat Kibria, M. Shahidur Rahman