Cross Lingual Text to Speech

Cross-lingual text-to-speech (TTS) aims to synthesize speech in a target language using a model trained primarily on a different source language, overcoming challenges like foreign accents and emotional expression transfer. Current research focuses on disentangling speaker identity and language characteristics within model architectures, employing techniques like diffusion models and triplet training to improve naturalness and intelligibility. These advancements are significant for applications such as automatic dubbing and low-resource language speech synthesis, enabling more efficient and accessible multilingual communication technologies.

Papers

June 12, 2024

VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech
Ashishkumar Gudmalwar, Nirmesh Shah, Sai Akarsh, Pankaj Wasnik, Rajiv Ratn Shah
End to End Text to Speech Emotional Speech Emotional Text to Speech Voice Identity Cross Lingual Text to Speech

September 2, 2023

DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Tao Li, Chenxu Hu, Jian Cong, Xinfa Zhu, Jingbei Li, Qiao Tian, Yuping Wang, Lei Xie
Diffusion Model Text to Speech Chinese Character Target Language Multilingual Speech Cross Lingual Emotion Cross Lingual Text to Speech

February 28, 2023

CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis
Ji-Hoon Kim, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim
Speech Synthesis Speech Representation Disentanglement Multilingual Speech Speaker Independent Speaker Modeling Cross Lingual Text to Speech

November 6, 2022

An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space
Jihwan Lee, Jae-Sung Bae, Seongkyu Mun, Heejin Choi, Joun Yeop Lee, Hoon-Young Cho, Chanwoo Kim
Empirical Study Text to Speech Cross Lingual Text to Speech Corrupted L2 Text

June 27, 2022

Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding
Wei-Ping Huang, Po-Chun Chen, Sung-Feng Huang, Hung-yi Lee
Transfer Learning Text to Speech Speech to Text Text to Speech Model Shot Training Cross Lingual Text to Speech

March 29, 2022

ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion
Edresson Casanova, Christopher Shulby, Alexander Korolev, Arnaldo Candido Junior, Anderson da Silva Soares, Sandra Aluísio, Moacir Antonelli Ponti
Automatic Speech Recognition Speech Synthesis Low Resource Voice Conversion Cross Lingual Text to Speech Lingual Voice Conversion

February 22, 2022

Improving Cross-lingual Speech Synthesis with Triplet Training Scheme
Jianhao Ye, Hongbin Zhou, Zhiba Su, Wendi He, Kaimeng Ren, Lin Li, Heng Lu
Text to Speech Speech Synthesis Synthesized Speech Multilingual Speech Triplet Learning Cross Lingual Text to Speech

November 17, 2021

Cross-lingual Low Resource Speaker Adaptation Using Phonological Features
Georgia Maniati, Nikolaos Ellinas, Konstantinos Markopoulos, Georgios Vamvoukakis, June Sig Sung, Hyoungmin Park, Aimilios Chalamandaris, Pirros Tsiakoulis
Speaker Verification Multi Speaker Phonological Feature Cross Lingual Text to Speech

Cross Lingual Text to Speech

Papers

VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech

DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin

CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis

An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space

Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding

ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion

Improving Cross-lingual Speech Synthesis with Triplet Training Scheme

Cross-lingual Low Resource Speaker Adaptation Using Phonological Features