Automatic Lyric Transcription

Automatic lyric transcription (ALT) aims to accurately and comprehensively transcribe song lyrics from audio, going beyond simple word recognition to capture punctuation, formatting, and structural elements crucial for readability and conveying artistic intent. Current research emphasizes improving transcription accuracy using various approaches, including adapting pre-trained speech models (like Whisper), employing self-supervised learning and semi-supervised techniques to address data scarcity, and integrating multimodal data (audio, video, IMU) for enhanced robustness. These advancements hold significant potential for improving user experiences in music applications like karaoke, live captioning, and music information retrieval systems, as well as furthering our understanding of music-related information processing.

Papers