Neural Speech Synthesis
Neural speech synthesis aims to generate human-like speech from text, focusing on improving naturalness, controllability, and efficiency. Current research emphasizes developing more robust models, such as those incorporating source-filter models, variational autoencoders, and diffusion probabilistic models, often paired with advanced vocoders like HiFi-GAN, to achieve high-fidelity audio. These advancements are crucial for applications ranging from assistive technologies and multimedia production to forensic analysis and language preservation, particularly for low-resource languages. Furthermore, research is actively addressing challenges like detecting synthetic speech and enhancing speaker anonymization techniques.
Papers
December 17, 2024
September 23, 2024
April 3, 2024
September 26, 2023
September 13, 2023
September 8, 2023
August 30, 2023
May 22, 2023
November 21, 2022
November 14, 2022
September 23, 2022
September 13, 2022
August 21, 2022
July 11, 2022
June 30, 2022
May 31, 2022
April 6, 2022
March 31, 2022