Text to Speech Model
Text-to-speech (TTS) models aim to synthesize natural-sounding human speech from text input, focusing on improving both the quality and controllability of generated audio. Current research emphasizes enhancing model architectures like Transformers and diffusion models, incorporating techniques such as preference alignment, adversarial training, and hierarchical acoustic modeling to achieve higher fidelity, speaker consistency, and emotional expressiveness. These advancements are significant for applications ranging from accessibility tools for the visually impaired to personalized voice assistants and improved synthetic data generation for other AI tasks.
Papers
February 27, 2023
February 7, 2023
November 28, 2022
November 23, 2022
November 17, 2022
October 26, 2022
October 7, 2022
September 22, 2022
June 29, 2022
June 27, 2022
June 9, 2022
May 15, 2022
April 22, 2022
April 8, 2022
March 29, 2022