Modern Vocoders
Modern vocoders are neural networks that synthesize high-fidelity audio waveforms from lower-dimensional acoustic representations like mel-spectrograms, aiming to improve the realism and efficiency of speech synthesis. Current research emphasizes improving the quality and speed of vocoder models, focusing on GAN-based architectures, diffusion models, and the use of alternative time-frequency representations beyond the standard STFT, such as CQT and MDCT, to enhance audio fidelity and reduce computational demands. These advancements have significant implications for applications like text-to-speech systems, voice conversion, and audio restoration, offering more natural-sounding and efficient synthetic speech.
Papers
September 24, 2024
August 13, 2024
June 13, 2024
June 12, 2024
May 2, 2024
April 26, 2024
November 25, 2023
September 18, 2023
September 16, 2023
September 6, 2023
August 2, 2023
June 22, 2023
May 18, 2023
April 25, 2023
November 29, 2022
November 25, 2022
November 2, 2022
October 23, 2022
September 21, 2022