Text to Speech Model
Text-to-speech (TTS) models aim to synthesize natural-sounding human speech from text input, focusing on improving both the quality and controllability of generated audio. Current research emphasizes enhancing model architectures like Transformers and diffusion models, incorporating techniques such as preference alignment, adversarial training, and hierarchical acoustic modeling to achieve higher fidelity, speaker consistency, and emotional expressiveness. These advancements are significant for applications ranging from accessibility tools for the visually impaired to personalized voice assistants and improved synthetic data generation for other AI tasks.
Papers
January 24, 2024
December 21, 2023
November 17, 2023
November 2, 2023
October 22, 2023
October 13, 2023
October 8, 2023
September 29, 2023
September 15, 2023
August 31, 2023
August 28, 2023
July 10, 2023
June 1, 2023
May 31, 2023
May 28, 2023
May 25, 2023
May 18, 2023
April 25, 2023
March 29, 2023