Tt Model

Text-to-speech (TTS) models aim to synthesize natural-sounding human speech from text input, a task increasingly tackled using deep learning. Current research focuses on improving speech quality and efficiency, exploring techniques like incorporating self-supervised learning for better speech representations, leveraging denoising diffusion probabilistic models for high-fidelity audio, and employing architectures that account for syntactic information and cross-sentence context for more natural prosody. These advancements are significant for both expanding low-resource language capabilities and enabling applications such as high-quality speech synthesis for assistive technologies and multimedia content creation.

Papers

November 19, 2024

Rethinking MUSHRA: Addressing Modern Challenges in Text-to-Speech Evaluation
Praveen Srinivasa Varadhan, Amogh Gulati, Ashwin Sankar, Srija Anand, Anirudh Gupta, Anirudh Mukherjee, Shiva Kumar Marepally, Ankur Bhatia, Saloni Jaju, Suvrat Bhooshan, Mitesh M. Khapra
Global Evaluation Text to Speech Current Challenge Tt Model

July 26, 2024

Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model
Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob Bartel, Kyle Kastner, Gary Wang, Andrew Rosenberg, Quan Wang
High Efficiency Text to Speech Speech Data Keyword Spotting Tt Model Current Tt System

April 7, 2023

ArmanTTS single-speaker Persian dataset
Mohammd Hasan Shamgholi, Vahid Saeedi, Javad Peymanfard, Leila Alhabib, Hossein Zeinali
Synthesized Speech Persian Dataset Single Speaker Vocoder Model Tt Model

March 5, 2023

A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS
Siyang Wang, Gustav Eje Henter, Joakim Gustafson, Éva Székely
Comparative Study Speech Representation Read V Self Supervised Speech Representation Tt Model

January 22, 2023

Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study
Massa Baali, Tomoki Hayashi, Hamdy Mubarak, Soumi Maiti, Shinji Watanabe, Wassim El-Hajj, Ahmed Ali
Automatic Speech Recognition Case Study Natural Sounding Speech Tt Model News Video Unsupervised Data Selection

December 30, 2022

ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech
Zehua Chen, Yihan Wu, Yichong Leng, Jiawei Chen, Haohe Liu, Xu Tan, Yang Cui, Ke Wang, Lei He, Sheng Zhao, Jiang Bian, Danilo Mandic
Text Modality Speech Analysis Text to Speech Denoising Diffusion Probabilistic Model Tt Model

September 14, 2022

ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS
Liumeng Xue, Frank K. Soong, Shaofei Zhang, Lei Xie
Text to Speech Prosodic Feature High Quality Speech Tt Model

April 25, 2022

SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech
Zhenhui Ye, Zhou Zhao, Yi Ren, Fei Wu
Speech Synthesis Prosody Modeling Tt Model

February 26, 2022

Revisiting Over-Smoothness in Text to Speech
Yi Ren, Xu Tan, Tao Qin, Zhou Zhao, Tie-Yan Liu
Text Modality Speech Analysis Mel Spectrogram Non Autoregressive Undisciplined Over Smoothing Multimodal Distribution Smoothing Problem Tt Model

November 19, 2021

More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
Michael Hassid, Michelle Tadmor Ramanovich, Brendan Shillingford, Miaosen Wang, Ye Jia, Tal Remez
Text to Speech Word List Prosodic Feature Wild Image Tt Model