Text to Music Generation

Text-to-music generation aims to create musical pieces from textual descriptions, bridging the gap between human language and musical expression. Current research heavily utilizes transformer-based and diffusion models, often incorporating large language models for enhanced control and longer, more structured compositions, and exploring multi-track generation for richer musical arrangements. This field is significant for its potential to democratize music creation, offering new tools for composers and musicians, and advancing our understanding of the relationship between language and music through the development of novel model architectures and datasets.

Papers