Voice Conversion

Voice conversion (VC) aims to transform a speaker's voice into another's while preserving the original linguistic content. Current research focuses on improving the quality and naturalness of converted speech, particularly in challenging scenarios like cross-lingual conversion and low-resource settings, often employing techniques like diffusion models, generative adversarial networks (GANs), and self-supervised learning with various encoder-decoder architectures. These advancements are significant for applications ranging from personalized voice assistants and accessibility tools to enhancing privacy in speech data and improving speech intelligibility assessment. The field is also actively addressing challenges related to disentangling speaker identity from other speech characteristics and mitigating vulnerabilities to deepfake attacks.

Papers