Articulatory Inversion

Articulatory inversion aims to reconstruct the movements of speech articulators (tongue, lips, etc.) from audio recordings, bridging the gap between acoustic signals and the physical mechanisms of speech production. Current research heavily utilizes deep learning models, often incorporating self-supervised learning and multi-channel attention mechanisms to improve speaker-independent performance and handle diverse speech characteristics, including dysarthric speech. This field is significant for advancing our understanding of speech production, enabling applications such as realistic avatar animation, improved speech synthesis, and potentially assisting in the diagnosis and treatment of speech disorders.

Papers