Disfluent Speech
Disfluent speech, encompassing phenomena like stuttering, repetitions, and filled pauses, presents a significant challenge for automatic speech recognition (ASR) systems typically trained on fluent speech. Current research focuses on improving ASR accuracy for disfluent speech through techniques such as incorporating disfluency detection into ASR models (often using connectionist temporal classification or similar architectures), leveraging large-scale self-supervised learning with targeted fine-tuning and data augmentation, and developing multimodal models that integrate acoustic and textual information. These advancements aim to enhance the inclusivity and usability of ASR technologies for individuals who stutter and improve the accuracy of transcriptions in various applications, including speech therapy and language learning.