Text to Speech System

Text-to-speech (TTS) systems aim to convert written text into natural-sounding speech, a goal pursued through advancements in both frontend (text processing) and backend (speech synthesis) modules. Current research emphasizes improving data efficiency, particularly for low-resource languages, by leveraging techniques like self-supervised learning and transfer learning across languages. Transformer-based architectures are prominent, alongside efforts to enhance expressiveness and controllability through improved linguistic feature representation and the development of more diverse and realistic speaker profiles. These improvements have implications for accessibility, language preservation, and applications ranging from assistive technologies to human-robot interaction.

Papers