Cross Lingual Text to Speech

Cross-lingual text-to-speech (TTS) aims to synthesize speech in a target language using a model trained primarily on a different source language, overcoming challenges like foreign accents and emotional expression transfer. Current research focuses on disentangling speaker identity and language characteristics within model architectures, employing techniques like diffusion models and triplet training to improve naturalness and intelligibility. These advancements are significant for applications such as automatic dubbing and low-resource language speech synthesis, enabling more efficient and accessible multilingual communication technologies.

Papers