Singing Voice
Singing voice research focuses on understanding and manipulating the acoustic properties of singing, primarily aiming to improve singing voice synthesis (SVS) and related technologies like voice conversion. Current research heavily utilizes deep learning models, including diffusion models, variational autoencoders, and transformers, often incorporating self-supervised learning to address data scarcity and improve the controllability and naturalness of synthesized voices. These advancements have implications for music production, virtual singers, accessibility technologies for the vocally impaired, and the detection of AI-generated "deepfakes," highlighting the growing importance of this interdisciplinary field.
Papers
CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Yongyi Zang, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu, Wenxiao Zhao, Jing Guo, Tomoki Toda, Zhiyao Duan
Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion
Ruiqi Li, Rongjie Huang, Yongqi Wang, Zhiqing Hong, Zhou Zhao