Lingual Voice Conversion

Lingual voice conversion focuses on modifying a speaker's voice to match a target speaker's characteristics, while preserving the original speech content and potentially translating across languages. Current research emphasizes developing robust models that disentangle speaker identity from speech content, often employing cycle-consistent architectures and variational autoencoders (VAEs) to achieve this separation, even with limited multilingual data. This field is significant for improving speech-to-speech translation systems, augmenting speech recognition datasets (especially for under-resourced languages), and mitigating privacy concerns in voice-based applications by avoiding direct voice cloning.

Papers