Singing Voice Synthesis

Singing voice synthesis (SVS) aims to generate realistic and expressive singing voices from musical scores and/or text prompts. Current research heavily focuses on improving the controllability and naturalness of synthesized voices, employing diverse model architectures such as diffusion models, transformers, and generative adversarial networks (GANs), often incorporating techniques like style transfer and multi-level style control. These advancements are significant for applications in music production, virtual singers, and accessibility technologies, while also driving progress in related fields like deepfake detection and audio processing.

Papers

January 31, 2024

Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing
Jiatong Shi, Yueqian Lin, Xinyi Bai, Keyi Zhang, Yuning Wu, Yuxun Tang, Yifeng Yu, Qin Jin, Shinji Watanabe
Gentle Introduction Singing Voice Singing Voice Synthesis Ace Opencpop

December 17, 2023

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis
Yu Zhang, Rongjie Huang, Ruiqi Li, JinZheng He, Yan Xia, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao
Style Transfer Singing Voice Singing Voice Synthesis

September 25, 2023

BiSinger: Bilingual Singing Voice Synthesis
Huali Zhou, Yueqian Lin, Yao Shi, Peng Sun, Ming Li
Singing Voice Singing Voice Synthesis Bilingual Data Singing Voice Conversion

September 14, 2023

SingFake: Singing Voice Deepfake Detection
Yongyi Zang, You Zhang, Mojtaba Heydari, Zhiyao Duan
Singing Voice Synthesis Deepfake Audio Synthetic Speech Detection Singing Voice Deepfake Detection

September 1, 2023

Enhancing the vocal range of single-speaker singing voice synthesis with melody-unsupervised pre-training
Shaohuan Zhou, Xu Li, Zhiyong Wu, Ying Shan, Helen Meng
Singing Voice Unsupervised Pre Training Singing Voice Synthesis Synthetic Voice Singer Identification

August 31, 2023

Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information
Shaohuan Zhou, Shun Lei, Weiya You, Deyi Tuo, Yuren You, Zhiyong Wu, Shiyin Kang, Helen Meng
Singing Voice Expressive Speech Bidirectional Encoder Representation From Transformer Singing Voice Synthesis Synthetic Voice End to End Singing Voice

August 5, 2023

A Systematic Exploration of Joint-training for Singing Voice Synthesis
Yuning Wu, Yifeng Yu, Jiatong Shi, Tao Qian, Qin Jin
Speech Synthesis Acoustic Model Active Exploration Singing Voice Synthesis High Fidelity Vocoder Joint Training

June 29, 2023

Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables
Chin-Yun Yu, György Fazekas
Speech Synthesis Singing Voice Synthesis Glottal Source

June 12, 2023

HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models
Ji-Sang Hwang, Sang-Hoon Lee, Seong-Whan Lee
Latent Diffusion Model Speech Synthesis Neural Audio Singing Voice Synthesis Video Masked

May 18, 2023

RMSSinger: Realistic-Music-Score based Singing Voice Synthesis
Jinzheng He, Jinglin Liu, Zhenhui Ye, Rongjie Huang, Chenye Cui, Huadai Liu, Zhou Zhao
Singing Voice Synthesis Human Score

May 11, 2023

CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
Zhen Ye, Wei Xue, Xu Tan, Jie Chen, Qifeng Liu, Yike Guo
Speech Analysis Speech Synthesis Denoising Diffusion Singing Voice Synthesis Consistency Model Diffusion Step

March 15, 2023

PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor
Yuning Wu, Jiatong Shi, Tao Qian, Dongji Gao, Qin Jin
Acoustic Feature Singing Voice Singing Voice Synthesis Pronunciation Training Hypothesized Phoneme Label

January 5, 2023

Singing voice synthesis based on frame-level sequence-to-sequence models considering vocal timing deviation
Miku Nishihara, Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda
Sequence to Sequence Model Singing Voice Synthesis Pronunciation Assessment

December 28, 2022

Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism
Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda
Attention Mechanism Speech Synthesis Sequence to Sequence Seq2seq Model Singing Voice Synthesis Position Aware Attention

December 3, 2022

UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis
Yi Lei, Shan Yang, Xinsheng Wang, Qicong Xie, Jixun Yao, Lei Xie, Dan Su
End to End Text to Speech Speech Synthesis Speech to Text Singing Voice Singing Voice Synthesis Conditional Variational

November 5, 2022

VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer
Yongmao Zhang, Heyang Xue, Hanzhao Li, Lei Xie, Tingwei Guo, Ruixiong Zhang, Caixia Gong
Singing Voice Synthesis Sound Synthesizer Noise Generation End to End Singing Voice

November 2, 2022

Singing Voice Synthesis with Vibrato Modeling and Latent Energy Representation
Yingjie Song, Wei Song, Wei Zhang, Zhengchen Zhang, Dan Zeng, Zhi Liu, Yang Yu
Singing Voice Synthesis Energy Based

October 28, 2022

NNSVS: A Neural Network-Based Singing Voice Synthesis Toolkit
Ryuichi Yamamoto, Reo Yoneyama, Tomoki Toda
Speech Synthesis Neural Topic Neural Vocoder Singing Voice Synthesis

October 26, 2022

Xiaoicesing 2: A High-Fidelity Singing Voice Synthesizer Based on Generative Adversarial Network
Chunhui Wang, Chang Zeng, Xing He
Generative Adversarial Network Mel Spectrogram Singing Voice Singing Voice Synthesis

October 23, 2022

HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation
Chunhui Wang, Chang Zeng, Jun Chen, Xing He
Generative Adversarial Network Neural Vocoder Singing Voice Synthesis Audio Spectrogram Modern Vocoders HiFi GAN

Singing Voice Synthesis

Papers

Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis

BiSinger: Bilingual Singing Voice Synthesis

SingFake: Singing Voice Deepfake Detection

Enhancing the vocal range of single-speaker singing voice synthesis with melody-unsupervised pre-training

Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information

A Systematic Exploration of Joint-training for Singing Voice Synthesis

Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables

HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models

RMSSinger: Realistic-Music-Score based Singing Voice Synthesis

CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model

PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor

Singing voice synthesis based on frame-level sequence-to-sequence models considering vocal timing deviation

Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism

UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis

VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer

Singing Voice Synthesis with Vibrato Modeling and Latent Energy Representation

NNSVS: A Neural Network-Based Singing Voice Synthesis Toolkit

Xiaoicesing 2: A High-Fidelity Singing Voice Synthesizer Based on Generative Adversarial Network

HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation