Accented Speech
Accented speech research focuses on improving automatic speech recognition (ASR) and text-to-speech (TTS) systems' performance for speakers with diverse accents, aiming to create more inclusive and equitable technologies. Current research heavily utilizes deep learning models, including Conformers, Wav2Vec 2.0, and various sequence-to-sequence architectures, often incorporating techniques like data augmentation (synthetic speech generation, pseudo-labeling), multi-modal learning, and meta-learning to address data scarcity and improve generalization across accents. This work is significant because it directly impacts the accessibility and usability of speech technologies for a wider population, particularly those whose accents are underrepresented in training data, and has implications for healthcare, education, and other fields relying on accurate speech processing.
Papers
Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment
Mu Yang, Kevin Hirschi, Stephen D. Looney, Okim Kang, John H. L. Hansen
Earnings-22: A Practical Benchmark for Accents in the Wild
Miguel Del Rio, Peter Ha, Quinten McNamara, Corey Miller, Shipra Chandra