Neural Text to Speech
Neural text-to-speech (TTS) aims to synthesize natural-sounding human speech from text input, focusing on improving both audio quality and expressiveness. Recent research emphasizes end-to-end models, often employing diffusion processes or transformer-based architectures, to directly generate waveforms without intermediate representations, and explores methods to enhance prosodic diversity and control vocal effort for improved intelligibility in noisy environments. These advancements are significant for applications ranging from accessibility technologies to virtual assistants, driving improvements in both the realism and usability of synthetic speech.
Papers
November 2, 2023
October 23, 2023
May 23, 2023
December 15, 2022
November 1, 2022
April 2, 2022
March 20, 2022