Modern Vocoders

Modern vocoders are neural networks that synthesize high-fidelity audio waveforms from lower-dimensional acoustic representations like mel-spectrograms, aiming to improve the realism and efficiency of speech synthesis. Current research emphasizes improving the quality and speed of vocoder models, focusing on GAN-based architectures, diffusion models, and the use of alternative time-frequency representations beyond the standard STFT, such as CQT and MDCT, to enhance audio fidelity and reduce computational demands. These advancements have significant implications for applications like text-to-speech systems, voice conversion, and audio restoration, offering more natural-sounding and efficient synthetic speech.

Papers

August 15, 2022

Towards Parametric Speech Synthesis Using Gaussian-Markov Model of Spectral Envelope and Wavelet-Based Decomposition of F0
Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Csaba Zainkó, Géza Németh
Synthesized Speech Neural Vocoder High Fidelity Vocoder Modern Vocoders Envelope Tracking Wavelet Decomposition Gauss Markov F0 Subband

August 9, 2022

DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation
Da-Yi Wu, Wen-Yi Hsiao, Fu-Rong Yang, Oscar Friedman, Warren Jackson, Scott Bruzenak, Yi-Wen Liu, Yi-Hsuan Yang
Comprehensive Evaluation High Fidelity Vocoder Modern Vocoders Conditional Music Generation

June 9, 2022

BigVGAN: A Universal Neural Vocoder with Large-Scale Training
Sang-gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon
Generative Adversarial Network Generative Adversarial Audio Synthesis Modern Vocoders Large Scale Training

April 1, 2022

Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis
Fan-Lin Wang, Po-chun Hsu, Da-rong Liu, Hung-yi Lee
Speech Synthesis Voice Conversion Mel Spectrogram High Fidelity Vocoder Modern Vocoders Long Short Range Adapter User Defined Configuration

December 8, 2021

Training Robust Zero-Shot Voice Conversion Models with Self-supervised Features
Trung Dang, Dung Tran, Peter Chin, Kazuhito Koishida
Self Supervised Speech Representation Zero Shot Voice Conversion Modern Vocoders Tree Decoder

December 6, 2021

VocBench: A Neural Vocoder Benchmark for Speech Synthesis
Ehab A. AlBadawy, Andrew Gibiansky, Qing He, Jilong Wu, Ming-Ching Chang, Siwei Lyu
Speech Synthesis Neural Vocoder High Fidelity Vocoder Modern Vocoders