Multi Speaker Text to Speech

Multi-speaker text-to-speech (TTS) aims to synthesize realistic speech from text for a variety of speakers, even those unseen during training. Current research focuses on improving the efficiency and quality of these systems, exploring techniques like frame selection, data augmentation with large language models, and the use of pre-trained models adapted via methods such as hypernetworks or contrastive learning. These advancements are significant because they address limitations in data availability and computational resources, paving the way for more versatile and accessible speech synthesis applications across diverse languages and speaker demographics.

Papers

July 11, 2022

Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data
Naoki Makishima, Satoshi Suzuki, Atsushi Ando, Ryo Masumura
Automatic Speech Recognition Synthesized Speech Unpaired Data Speaker Similarity Multi Speaker Text to Speech Joint Semi Supervised

May 24, 2022

TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTS
Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Text to Speech Multi Speaker Text to Speech Personalized Speech Low Resource Text to Speech

March 29, 2022

Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus
Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Sunghwan Ahn, Joun Yeop Lee, Nam Soo Kim
Text to Speech Text to Speech Model Unlabeled Speech Multi Speaker Text to Speech Single Speaker Transfer Learning Framework Low Resource Text to Speech

February 22, 2022

nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-shot Multi-speaker Text-to-Speech
Botao Zhao, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Text to Speech Speaker Representation Conditional Variational Autoencoder Conditional Variational Multi Speaker Text to Speech

January 27, 2022

Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric Speech Recognition
Mohammad Soleymanpour, Michael T. Johnson, Rahim Soleymanpour, Jeffrey Berry
Synthesized Speech Dysarthric Speech Multi Speaker Text to Speech Dysarthric Speech Recognition

January 19, 2022

MHTTS: Fast multi-head text-to-speech for spontaneous speech with imperfect transcription
Dabiao Ma, Yitong Zhang, Meng Li, Feng Ye
End to End Synthesized Speech Multi Speaker Spontaneous Speech Multi Speaker Text to Speech Multi Speaker Tt Transcription Error

December 9, 2021

X-Vector based voice activity detection for multi-genre broadcast speech-to-text
Misa Ogura, Matt Haynes
Automatic Speech Recognition X Vector Voice Activity Detection Multi Speaker Text to Speech

Multi Speaker Text to Speech

Papers

Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data

TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTS

Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus

nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-shot Multi-speaker Text-to-Speech

Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric Speech Recognition

MHTTS: Fast multi-head text-to-speech for spontaneous speech with imperfect transcription

X-Vector based voice activity detection for multi-genre broadcast speech-to-text