Voice Conversion
Voice conversion (VC) aims to transform a speaker's voice into another's while preserving the original linguistic content. Current research focuses on improving the quality and naturalness of converted speech, particularly in challenging scenarios like cross-lingual conversion and low-resource settings, often employing techniques like diffusion models, generative adversarial networks (GANs), and self-supervised learning with various encoder-decoder architectures. These advancements are significant for applications ranging from personalized voice assistants and accessibility tools to enhancing privacy in speech data and improving speech intelligibility assessment. The field is also actively addressing challenges related to disentangling speaker identity from other speech characteristics and mitigating vulnerabilities to deepfake attacks.
Papers
Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example
Suhita Ghosh, Melanie Jouaiti, Arnab Das, Yamini Sinha, Tim Polzehl, Ingo Siegert, Sebastian Stober
Improving Voice Quality in Speech Anonymization With Just Perception-Informed Losses
Suhita Ghosh, Tim Thiele, Frederic Lorbeer, Frank Dreyer, Sebastian Stober
Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in Any-to-One Voice Conversion
Giuseppe Ruggiero, Matteo Testa, Jurgen Van de Walle, Luigi Di Caro
Exploring synthetic data for cross-speaker style transfer in style representation based TTS
Lucas H. Ueda, Leonardo B. de M. M. Marques, Flávio O. Simões, Mário U. Neto, Fernando Runstein, Bianca Dal Bó, Paula D. P. Costa