Neural Vocoder
Neural vocoders are artificial neural networks designed to synthesize high-quality audio waveforms from intermediate representations like mel-spectrograms, aiming to improve the realism and efficiency of speech and music synthesis. Current research emphasizes developing faster, more efficient models, often employing Generative Adversarial Networks (GANs) or diffusion probabilistic models, and exploring techniques like differentiable digital signal processing to enhance both speed and audio quality. These advancements have significant implications for various applications, including text-to-speech systems, audio editing, and the creation of realistic synthetic voices, while also impacting fields like speech science and deepfake detection through improved analysis and synthesis capabilities.
Papers
Differentiable WORLD Synthesizer-based Neural Vocoder With Application To End-To-End Audio Style Transfer
Shahan Nercessian
Towards Parametric Speech Synthesis Using Gaussian-Markov Model of Spectral Envelope and Wavelet-Based Decomposition of F0
Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Csaba Zainkó, Géza Németh