Text to Music Diffusion Model
Text-to-music diffusion models aim to generate realistic music from textual descriptions, leveraging the power of diffusion processes to create high-quality audio. Current research focuses on improving controllability through techniques like fine-tuning with audio prompts, subtractive training for stem insertion, and inference-time optimization, often employing attention-based adapters or cascaded diffusion model architectures. These advancements enable more nuanced control over musical elements, including genre, timbre, rhythm, and even the addition or modification of individual instrument parts, with applications ranging from music composition assistance to personalized music generation.
Papers
July 23, 2024
June 27, 2024
June 7, 2024
March 18, 2024
January 22, 2024
September 20, 2023