Modern Vocoders
Modern vocoders are neural networks that synthesize high-fidelity audio waveforms from lower-dimensional acoustic representations like mel-spectrograms, aiming to improve the realism and efficiency of speech synthesis. Current research emphasizes improving the quality and speed of vocoder models, focusing on GAN-based architectures, diffusion models, and the use of alternative time-frequency representations beyond the standard STFT, such as CQT and MDCT, to enhance audio fidelity and reduce computational demands. These advancements have significant implications for applications like text-to-speech systems, voice conversion, and audio restoration, offering more natural-sounding and efficient synthetic speech.
Papers
August 15, 2022
August 9, 2022
June 9, 2022
April 1, 2022
December 8, 2021