Tt Model
Text-to-speech (TTS) models aim to synthesize natural-sounding human speech from text input, a task increasingly tackled using deep learning. Current research focuses on improving speech quality and efficiency, exploring techniques like incorporating self-supervised learning for better speech representations, leveraging denoising diffusion probabilistic models for high-fidelity audio, and employing architectures that account for syntactic information and cross-sentence context for more natural prosody. These advancements are significant for both expanding low-resource language capabilities and enabling applications such as high-quality speech synthesis for assistive technologies and multimedia content creation.
Papers
November 19, 2024
July 26, 2024
April 7, 2023
March 5, 2023
January 22, 2023
December 30, 2022
September 14, 2022
April 25, 2022
February 26, 2022