Singing Voice Synthesis

Singing voice synthesis (SVS) aims to generate realistic and expressive singing voices from musical scores and/or text prompts. Current research heavily focuses on improving the controllability and naturalness of synthesized voices, employing diverse model architectures such as diffusion models, transformers, and generative adversarial networks (GANs), often incorporating techniques like style transfer and multi-level style control. These advancements are significant for applications in music production, virtual singers, and accessibility technologies, while also driving progress in related fields like deepfake detection and audio processing.

Papers