HiFi GAN

HiFi-GAN is a generative adversarial network (GAN)-based neural vocoder designed for high-fidelity audio synthesis, primarily focusing on generating realistic speech and singing voice waveforms from acoustic representations like mel-spectrograms. Current research emphasizes improving HiFi-GAN's performance through architectural modifications, such as incorporating source-filter models, enhanced discriminators using alternative time-frequency representations (e.g., Constant-Q Transform), and diffusion-based training methods to enhance stability and quality. These advancements aim to improve synthesis speed, quality, and controllability, impacting applications in text-to-speech systems, speech enhancement, and music generation.

Papers