Vocoder Model

Vocoders are models that synthesize audio waveforms from lower-dimensional representations like mel-spectrograms, serving as a crucial component in text-to-speech and other audio generation systems. Current research focuses on improving vocoder quality, particularly addressing artifacts and instability, often through advancements in Generative Adversarial Networks (GANs) and incorporating techniques like contrastive learning and explicit pitch modeling to enhance realism and expressiveness. These improvements are significant for applications ranging from high-fidelity speech synthesis to speaker anonymization and music generation, driving advancements in both audio processing and machine learning.

Papers

June 10, 2024

JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis
Hyunjae Cho, Junhyeok Lee, Wonbin Jung
Speech Synthesis Neural Vocoder Perceptual Aliasing Low Pas Vocoder Model Tonal Language

September 16, 2023

Enhancing GAN-Based Vocoders with Contrastive Learning Under Data-limited Condition
Haoming Guo, Seth Z. Zhao, Jiachen Lian, Gopala Anumanchipalli, Gerald Friedland
Contrastive Learning Limited Data High Fidelity Vocoder Modern Vocoders Vocoder Model

July 17, 2023

Vocoder drift compensation by x-vector alignment in speaker anonymisation
Michele Panariello, Massimiliano Todisco, Nicholas Evans
X Vector Vocoder Model

June 2, 2023

Towards Robust FastSpeech 2 by Modelling Residual Multimodality
Fabian Kögel, Bac Nguyen, Fabien Cardinaux
Mel Spectrogram Non Autoregressive Text to Speech Vocoder Model

April 7, 2023

ArmanTTS single-speaker Persian dataset
Mohammd Hasan Shamgholi, Vahid Saeedi, Javad Peymanfard, Leila Alhabib, Hossein Zeinali
Synthesized Speech Persian Dataset Single Speaker Vocoder Model Tt Model

October 28, 2022

Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis
Yuma Shirahata, Ryuichi Yamamoto, Eunwoo Song, Ryo Terashima, Jae-Min Kim, Kentaro Tachibana
End to End Variational Inference Speech Synthesis Prosodic Feature Periodicity Detection Vocoder Model Stochastic Pitch Prediction End to End Tt System

August 26, 2022

Mel Spectrogram Inversion with Stable Pitch
Bruno Di Giorgi, Mark Levy, Richard Sharp
Mel Spectrogram High Fidelity Vocoder Modern Vocoders Vocoder Model

April 5, 2022

What can predictive speech coders learn from speaker recognizers?
Marcos Faundez-Zanuy
Speech Analysis Speaker Recognition Signal to Noise Ratio Vocoder Model

March 18, 2022

AdaVocoder: Adaptive Vocoder for Custom Voice
Xin Yuan, Yongbing Feng, Mingming Ye, Cheng Tuo, Minghang Zhang
Generative Adversarial Vocoder Model

February 24, 2022