End to End Tt System

End-to-end text-to-speech (TTS) systems aim to directly synthesize speech from text without intermediate steps, improving efficiency and potentially quality. Current research focuses on enhancing models like VITS, addressing challenges such as efficient inference speed (through techniques like iSTFT), robust performance with limited data (via transfer learning and automatic prosody annotation), and stable pitch generation, particularly for emotional speech. These advancements are significant for expanding TTS capabilities to low-resource languages and enabling more natural and expressive speech synthesis across diverse applications.

Papers

May 26, 2023

Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis
Seongyeon Park, Bohyung Kim, Tae-hyun Oh
End to End Speech Synthesis Hyper Parameter Automatic Tuning Balanced Loss End to End Tt System

February 16, 2023

QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion
Houjian Guo, Chaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro
Automatic Speech Recognition Voice Conversion Synthesized Speech Automated Conversion Inverse Short Time Fourier Transform End to End Tt System

November 17, 2022

Low-Resource Mongolian Speech Synthesis Based on Automatic Prosody Annotation
Xin Yuan, Robin Feng, Mingming Ye
Prosodic Feature Mongolian Text to Speech End to End Tt System

October 28, 2022

Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis
Yuma Shirahata, Ryuichi Yamamoto, Eunwoo Song, Ryo Terashima, Jae-Min Kim, Kentaro Tachibana
End to End Variational Inference Speech Synthesis Prosodic Feature Periodicity Detection Vocoder Model Stochastic Pitch Prediction End to End Tt System

June 1, 2022

AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation
Kun Song, Heyang Xue, Xinsheng Wang, Jian Cong, Yongmao Zhang, Lei Xie, Bing Yang, Xiong Zhang, Dan Su
Text Encoder Speaker Adaptation End to End Tt System

End to End Tt System

Papers

Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis

QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion

Low-Resource Mongolian Speech Synthesis Based on Automatic Prosody Annotation

Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis

AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation