Voice Conversion
Voice conversion (VC) aims to transform a speaker's voice into another's while preserving the original linguistic content. Current research focuses on improving the quality and naturalness of converted speech, particularly in challenging scenarios like cross-lingual conversion and low-resource settings, often employing techniques like diffusion models, generative adversarial networks (GANs), and self-supervised learning with various encoder-decoder architectures. These advancements are significant for applications ranging from personalized voice assistants and accessibility tools to enhancing privacy in speech data and improving speech intelligibility assessment. The field is also actively addressing challenges related to disentangling speaker identity from other speech characteristics and mitigating vulnerabilities to deepfake attacks.
Papers
MetaSpeech: Speech Effects Switch Along with Environment for Metaverse
Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Streaming Parrotron for on-device speech-to-speech conversion
Oleg Rybakov, Fadi Biadsy, Xia Zhang, Liyang Jiang, Phoenix Meadowlark, Shivani Agrawal
Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion
Kun Zhou, Berrak Sisman, Carlos Busso, Bin Ma, Haizhou Li