Prompt Based Text to Speech

Prompt-based text-to-speech (TTS) aims to synthesize speech controlled by natural language descriptions, enabling fine-grained manipulation of speaker characteristics and style. Current research focuses on improving the accuracy and naturalness of synthesized speech by employing techniques like low-rank adaptation, retrieval augmented generation, and diffusion models, often leveraging pre-trained multi-speaker TTS systems. These advancements are driven by the development of new datasets with rich prompt annotations and improved methods for extracting speaker-related information from prompts. The resulting improvements in controllability and realism have significant implications for applications ranging from personalized voice assistants to accessible communication technologies.

Papers