Spontaneous Speech Synthesis

Spontaneous speech synthesis aims to generate synthetic speech that mimics the naturalness and disfluencies of human conversation, a significant challenge due to the complexity of spontaneous speech patterns. Current research focuses on leveraging large language models and self-supervised learning of speech representations to improve the naturalness of synthesized speech, particularly addressing the accurate modeling of prosody and filled pauses. These advancements are improving the realism of synthetic speech, with implications for applications ranging from personalized voice assistants to more natural-sounding audiobooks and accessibility technologies.

Papers