Controllable Text to Speech
Controllable text-to-speech (TTS) aims to synthesize speech not only from text input but also with precise control over various aspects like speaker identity, speaking style, and emotional expression, all guided by natural language descriptions. Current research focuses on developing models that achieve this control using techniques such as decoder-only transformers, normalizing flows to model variance in speech features, and multi-modal approaches incorporating text and speech information. These advancements are improving the naturalness and robustness of synthesized speech, leading to applications in areas like personalized voice assistants, accessible communication technologies, and more expressive audio content creation.
Papers
June 3, 2024
January 25, 2024
September 11, 2023
August 28, 2023
June 16, 2023
May 19, 2023
May 17, 2023
March 2, 2023
February 27, 2023