Dysarthric Speech Reconstruction

Dysarthric speech reconstruction (DSR) aims to convert unintelligible or distorted speech caused by neurological disorders into clear, natural-sounding speech. Current research heavily utilizes multi-modal approaches, incorporating visual information (lip movements) alongside audio, and employs neural network architectures like neural codec language models and encoder-decoder frameworks, often incorporating techniques like adversarial training and self-supervised learning to improve speaker similarity and prosody. These advancements significantly improve speech intelligibility and naturalness, offering substantial potential for enhancing communication and quality of life for individuals with dysarthria.

Papers