Prosody Control
Prosody control in speech synthesis aims to manipulate the rhythm, intonation, and stress of synthetic speech to enhance naturalness and expressiveness, mirroring the nuances of human communication. Current research focuses on developing methods for fine-grained control over prosodic parameters like pitch and duration, often employing techniques like phoneme-level prosodic clustering and the modification of existing text-to-speech (TTS) models such as FastSpeech2 and neural HMMs. These advancements are improving the quality and realism of synthetic speech, with applications ranging from improved TTS systems to enabling more sophisticated control over the emotional and pragmatic aspects of generated audio. The ability to precisely control prosody is crucial for creating more natural and engaging synthetic voices across various applications.
Papers
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis
Alexandra Vioni, Myrsini Christidou, Nikolaos Ellinas, Georgios Vamvoukakis, Panos Kakoulidis, Taehoon Kim, June Sig Sung, Hyoungmin Park, Aimilios Chalamandaris, Pirros Tsiakoulis
Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Myrsini Christidou, Alexandra Vioni, Nikolaos Ellinas, Georgios Vamvoukakis, Konstantinos Markopoulos, Panos Kakoulidis, June Sig Sung, Hyoungmin Park, Aimilios Chalamandaris, Pirros Tsiakoulis