Singing Voice Transcription

Singing voice transcription (SVT) aims to automatically convert recorded singing into musical notation, a challenging task due to vocal expressiveness and noisy audio. Current research focuses on improving accuracy and robustness using deep learning models, often incorporating multimodal data (audio and video) and self-supervised learning techniques to address data scarcity. These advancements are crucial for applications in music information retrieval, digital music creation, and accessibility tools for musicians, while also highlighting the need to address inherent biases in existing systems, such as gender disparities in transcription accuracy.

Papers