Cross Modal Music

Cross-modal music research focuses on understanding and generating music by integrating multiple modalities like audio, lyrics, sheet music, and even motion capture data. Current efforts concentrate on developing robust cross-modal retrieval systems, often employing deep learning architectures like contrastive learning and diffusion models, to link audio and symbolic representations or generate music from textual or visual inputs. This work is significant for advancing music information retrieval, enabling new forms of music generation and analysis, and improving the interpretability of complex music understanding models.

Papers