Text to Speech Model
Text-to-speech (TTS) models aim to synthesize natural-sounding human speech from text input, focusing on improving both the quality and controllability of generated audio. Current research emphasizes enhancing model architectures like Transformers and diffusion models, incorporating techniques such as preference alignment, adversarial training, and hierarchical acoustic modeling to achieve higher fidelity, speaker consistency, and emotional expressiveness. These advancements are significant for applications ranging from accessibility tools for the visually impaired to personalized voice assistants and improved synthetic data generation for other AI tasks.
Papers
October 22, 2024
October 17, 2024
September 26, 2024
September 19, 2024
August 28, 2024
August 26, 2024
July 31, 2024
June 30, 2024
June 16, 2024
June 11, 2024
June 8, 2024
June 4, 2024
May 23, 2024
May 16, 2024
May 15, 2024
March 13, 2024
March 9, 2024
February 26, 2024
February 12, 2024