Adaptive Text to Speech

Adaptive text-to-speech (TTS) aims to generate synthetic speech that accurately reflects a target speaker's voice characteristics, even with limited training data. Current research focuses on improving the generalization ability of models, particularly for speakers with accents, using techniques like diffusion models and transformer networks, often incorporating both zero-shot and few-shot adaptation strategies. This field is significant because it promises more natural and personalized speech synthesis across diverse populations, impacting applications ranging from accessibility tools to virtual assistants and entertainment.

Papers

June 21, 2024

GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech
Wenbin Wang, Yang Song, Sanjay Jha
Accented Speech Speaker Similarity Quality Corpus Globe Ce Adaptive Text to Speech

April 28, 2024

USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
Wenbin Wang, Yang Song, Sanjay Jha
Spatial Audio Zero Shot Text to Speech Zero Shot Speaker Adaptation Adaptive Text to Speech Shot Speaker

March 3, 2023

An investigation into the adaptability of a diffusion-based TTS model
Haolin Chen, Philip N. Garner
Transformer Based Adaptive Importance Comprehensive Investigation Layer Normalization Diffusion Based Text Diffusion Pipeline Adaptive Text to Speech

November 17, 2022

Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models
Minki Kang, Dongchan Min, Sung Ju Hwang
Diffusion Model Speech to Text Adaptive Text to Speech

May 30, 2022

Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech with Untranscribed Data
Sungwon Kim, Heeseung Kim, Sungroh Yoon
Diffusion Model Text to Speech Untranscribed Data Adaptive Text to Speech

February 15, 2022

SpeechPainter: Text-conditioned Speech Inpainting
Zalán Borsos, Matt Sharifi, Marco Tagliasacchi
Speech Analysis Speaker Identity Speech Segment Adaptive Text to Speech

November 7, 2021

Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Sung-Feng Huang, Chyi-Jiunn Lin, Da-Rong Liu, Yi-Chen Chen, Hung-yi Lee
Speech Encoder Speaker Adaptation Adaptive Text to Speech