High Fidelity Speech
High-fidelity speech synthesis aims to generate highly realistic and natural-sounding speech, focusing on improving both objective quality metrics and subjective listening experience. Current research heavily utilizes generative adversarial networks (GANs) and diffusion probabilistic models (DDPMs), often incorporating techniques like multi-scale analysis, time-frequency domain supervision, and adaptive noise shaping to enhance the generated audio. These advancements are driving significant improvements in speech super-resolution, vocoder performance, and text-to-speech systems, with implications for applications ranging from assistive technologies to virtual assistants and realistic audio-visual content creation.
Papers
September 14, 2024
August 13, 2024
February 22, 2024
January 25, 2024
October 27, 2023
October 11, 2023
September 18, 2023
September 9, 2023
June 25, 2023
November 2, 2022
July 5, 2022
June 27, 2022
April 21, 2022
March 31, 2022