Speech Synthesis

Speech synthesis aims to generate human-like speech from text or other inputs, focusing on improving naturalness, expressiveness, and efficiency. Current research emphasizes advancements in model architectures like diffusion models, generative adversarial networks (GANs), and large language models (LLMs), often incorporating techniques such as low-rank adaptation (LoRA) for parameter efficiency and improved control over aspects like emotion and prosody. These improvements have significant implications for applications ranging from assistive technologies for the visually impaired to creating realistic virtual avatars and enhancing accessibility for under-resourced languages.

Papers