Text to Speech
Text-to-speech (TTS) research aims to synthesize natural-sounding human speech from textual input, focusing on improving speech quality, speaker similarity, and efficiency. Current efforts concentrate on developing advanced architectures like diffusion models and transformers, often incorporating techniques such as flow matching and semantic communication to enhance both the naturalness and expressiveness of generated speech. This field is crucial for applications ranging from assistive technologies and accessibility tools to combating deepfakes and creating more realistic synthetic datasets for training other AI models.
Papers
Evaluating and Personalizing User-Perceived Quality of Text-to-Speech Voices for Delivering Mindfulness Meditation with Different Physical Embodiments
Zhonghao Shi, Han Chen, Anna-Maria Velentza, Siqi Liu, Nathaniel Dennler, Allison O'Connell, Maja Matarić
Transfer the linguistic representations from TTS to accent conversion with non-parallel data
Xi Chen, Jiakun Pei, Liumeng Xue, Mingyang Zhang