Speech Naturalness
Speech naturalness in text-to-speech (TTS) synthesis focuses on generating synthetic speech indistinguishable from human speech, prioritizing accurate prosody, timbre, and overall quality. Current research emphasizes disentangling speech components (content, prosody, timbre) using techniques like factorized diffusion models and variational autoencoders (VAEs), often coupled with large-scale datasets and billion-parameter models. These advancements aim to improve the realism and emotional expressiveness of synthetic speech, impacting fields like virtual assistants, accessibility technologies, and entertainment.
Papers
March 5, 2024
February 22, 2024
February 12, 2024
June 21, 2023
April 18, 2023
November 1, 2022
October 12, 2022
June 24, 2022
May 9, 2022