Diffusion Based Text
Diffusion-based models are revolutionizing text-to-speech (TTS) synthesis, offering high-quality, diverse audio generation even in zero-shot scenarios. Current research focuses on improving robustness, efficiency, and control over aspects like speaker identity, emotion, and editing capabilities, often employing techniques like latent diffusion models, classifier-free guidance, and reinforcement learning for fine-tuning. These advancements are significantly impacting the field by enabling more natural and expressive speech synthesis, personalized voice generation, and efficient audio editing, with applications ranging from personalized assistants to multimedia content creation.
Papers
December 11, 2024
October 14, 2024
October 12, 2024
October 9, 2024
September 19, 2024
September 14, 2024
August 27, 2024
June 27, 2024
May 23, 2024
April 15, 2024
February 19, 2024
December 6, 2023
September 19, 2023
September 13, 2023
August 3, 2023
May 25, 2023
March 23, 2023
March 3, 2023
November 17, 2022