Vocoder Model
Vocoders are models that synthesize audio waveforms from lower-dimensional representations like mel-spectrograms, serving as a crucial component in text-to-speech and other audio generation systems. Current research focuses on improving vocoder quality, particularly addressing artifacts and instability, often through advancements in Generative Adversarial Networks (GANs) and incorporating techniques like contrastive learning and explicit pitch modeling to enhance realism and expressiveness. These improvements are significant for applications ranging from high-fidelity speech synthesis to speaker anonymization and music generation, driving advancements in both audio processing and machine learning.
Papers
June 10, 2024
September 16, 2023
July 17, 2023
June 2, 2023
April 7, 2023
October 28, 2022
August 26, 2022
April 5, 2022
March 18, 2022
February 24, 2022