Natural Sounding Speech
Natural-sounding speech synthesis aims to generate human-like speech from text, focusing on improving quality, diversity, and robustness across languages and speaking styles. Current research emphasizes advancements in model architectures like diffusion models, variational autoencoders, and transformer networks, often incorporating techniques such as disentangled representations and adversarial training to enhance naturalness and control over prosody and emotion. This field is crucial for applications ranging from assistive technologies and personalized voice assistants to combating synthetic misinformation, driving ongoing efforts to develop more accurate and efficient speech synthesis systems and robust detection methods.
Papers
November 27, 2024
October 23, 2024
September 17, 2024
August 22, 2024
July 15, 2024
July 8, 2024
July 1, 2024
April 30, 2024
April 16, 2024
April 3, 2024
March 19, 2024
March 18, 2024
March 13, 2024
March 5, 2024
September 4, 2023
August 30, 2023
July 31, 2023
July 30, 2023
July 10, 2023