Pronunciation Quality

Pronunciation quality assessment is a rapidly evolving field focused on automatically evaluating the accuracy and fluency of spoken language, primarily aiding language learners and speech rehabilitation. Current research emphasizes multi-aspect assessment (accuracy, fluency, prosody) across different granularities (phoneme, word, syllable), often employing deep learning models like Transformers and LSTMs, along with techniques such as acoustic feature mixup and spectrogram inpainting to improve accuracy and address data imbalances. These advancements have significant implications for developing effective computer-assisted pronunciation training systems and personalized speech therapy tools.

Papers