Singing Voice Synthesis
Singing voice synthesis (SVS) aims to generate realistic and expressive singing voices from musical scores and/or text prompts. Current research heavily focuses on improving the controllability and naturalness of synthesized voices, employing diverse model architectures such as diffusion models, transformers, and generative adversarial networks (GANs), often incorporating techniques like style transfer and multi-level style control. These advancements are significant for applications in music production, virtual singers, and accessibility technologies, while also driving progress in related fields like deepfake detection and audio processing.
Papers
SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models
Yuxun Tang, Yuning Wu, Jiatong Shi, Qin Jin
VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation
Yifeng Yu, Jiatong Shi, Yuning Wu, Yuxun Tang, Shinji Watanabe