High Fidelity Vocoder
High-fidelity vocoders are neural networks that synthesize high-quality audio waveforms from lower-dimensional acoustic representations, aiming to improve the realism and naturalness of synthetic speech. Current research focuses on enhancing vocoder efficiency and speed through architectural innovations like lightweight GANs and DDSP models, as well as improving audio quality via techniques such as feature smoothing, contrastive learning, and refined discriminators. These advancements have significant implications for applications like text-to-speech synthesis, voice conversion, and speech enhancement, offering improvements in both the speed and quality of audio generation.
Papers
Towards single integrated spoofing-aware speaker verification embeddings
Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung
Voice Conversion With Just Nearest Neighbors
Matthew Baas, Benjamin van Niekerk, Herman Kamper