Spontaneous Speech Synthesis

Spontaneous speech synthesis aims to generate synthetic speech that mimics the naturalness and disfluencies of human conversation, a significant challenge due to the complexity of spontaneous speech patterns. Current research focuses on leveraging large language models and self-supervised learning of speech representations to improve the naturalness of synthesized speech, particularly addressing the accurate modeling of prosody and filled pauses. These advancements are improving the realism of synthetic speech, with implications for applications ranging from personalized voice assistants to more natural-sounding audiobooks and accessibility technologies.

Papers

July 18, 2024

Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models
Weiqin Li, Peiji Yang, Yicheng Zhong, Yixuan Zhou, Zhisheng Wang, Zhiyong Wu, Xixin Wu, Helen Meng
Language Model Spontaneous Speech Fine Grained Prosody Behavior Label Spontaneous Speech Synthesis

July 11, 2023

On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis
Siyang Wang, Gustav Eje Henter, Joakim Gustafson, Éva Székely
Text to Speech Speech Synthesis Greater Public Use Speech Representation Synthesized Speech Self Supervised Speech Representation Mean Opinion Score Spontaneous Speech Synthesis

October 18, 2022

Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion
Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari
Native Robustness Synthesized Speech Subword Regularization Inappropriate Pause Spontaneous Speech Synthesis

October 14, 2022

Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis
Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari
Empirical Study Speech Synthesis Synthesized Speech Personalized Subject Voice Cloning Linguistic Knowledge Personalized Speech Speech Pause Spontaneous Speech Synthesis

Spontaneous Speech Synthesis

Papers

Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models

On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis

Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion

Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis