Expressive Text to Speech
Expressive Text-to-Speech (TTS) aims to synthesize speech that naturally conveys emotion, style, and other nuanced aspects of human communication. Current research heavily focuses on improving control over these expressive qualities, often employing diffusion models and large language models to leverage natural language prompts or reference audio for style transfer. This involves developing robust methods for representing and manipulating prosody, and addressing challenges like data scarcity and the need for generalization across speakers and styles. Advances in expressive TTS have significant implications for applications ranging from accessibility technologies to more engaging virtual assistants and creative content generation.
Papers
June 27, 2024
April 23, 2024
November 2, 2023
September 21, 2023
May 18, 2023
April 1, 2023
January 31, 2023
January 26, 2023
November 26, 2022
November 4, 2022
July 13, 2022