Prosody Prediction
Prosody prediction aims to automatically generate the natural rhythm, intonation, and stress patterns of speech from text, crucial for creating realistic and engaging synthetic speech. Current research focuses on improving prediction accuracy using various techniques, including advanced language models (like BERT and others), multi-task learning frameworks that incorporate linguistic features (e.g., part-of-speech tags), and generative models such as diffusion probabilistic models. These advancements are significantly impacting text-to-speech systems, enabling more natural-sounding speech and facilitating cross-lingual applications, as well as improving the analysis of existing speech data like audiobooks.
Papers
September 30, 2024
August 13, 2024
October 10, 2023
September 4, 2023
August 31, 2023
June 29, 2023
May 26, 2023
May 18, 2023
October 13, 2022
March 31, 2022
November 15, 2021