Prosody Control

Prosody control in speech synthesis aims to manipulate the rhythm, intonation, and stress of synthetic speech to enhance naturalness and expressiveness, mirroring the nuances of human communication. Current research focuses on developing methods for fine-grained control over prosodic parameters like pitch and duration, often employing techniques like phoneme-level prosodic clustering and the modification of existing text-to-speech (TTS) models such as FastSpeech2 and neural HMMs. These advancements are improving the quality and realism of synthetic speech, with applications ranging from improved TTS systems to enabling more sophisticated control over the emotional and pragmatic aspects of generated audio. The ability to precisely control prosody is crucial for creating more natural and engaging synthetic voices across various applications.

Papers