Conversational Speech Synthesis
Conversational speech synthesis (CSS) aims to generate realistic and expressive speech within the context of a dialogue, focusing on natural prosody, emotion, and turn-taking. Current research emphasizes improving context modeling using techniques like heterogeneous graphs and contrastive learning, often incorporating large language models to enhance both semantic understanding and stylistic control. These advancements are driven by the need for larger, more diverse datasets, including those with natural conversational styles and emotional annotations, to improve the naturalness and expressiveness of synthesized speech, ultimately impacting applications like conversational AI and accessibility technologies.
Papers
July 31, 2024
June 6, 2024
December 19, 2023
December 16, 2023
August 31, 2023
May 29, 2023