Supervised Text to Speech
Supervised text-to-speech (TTS) aims to synthesize high-quality speech from text data using machine learning, focusing on improving efficiency and realism. Current research emphasizes developing models that require less labeled training data (semi-supervised and minimally-supervised approaches), often employing diffusion models and vector quantization techniques to generate more natural and expressive speech. These advancements are significant because they reduce the substantial data requirements of traditional TTS systems, making high-quality speech synthesis more accessible and applicable to a wider range of languages and voices.
Papers
June 16, 2024
April 10, 2024
November 14, 2023
September 27, 2023
September 1, 2023
August 31, 2023
July 28, 2023