Text to Audio Model
Text-to-audio models aim to generate realistic audio from textual descriptions, focusing on improving audio quality, diversity, and alignment with user intent. Current research emphasizes using large language models to enhance control over generated audio, incorporating multimodal data (like video) for richer context, and leveraging techniques like diffusion models and preference optimization to refine generation quality. These advancements are significant for various applications, including content creation, accessibility technologies, and training data generation for other audio-related tasks.
Papers
July 19, 2024
July 18, 2024
July 8, 2024
June 18, 2024
April 25, 2024
April 15, 2024
March 26, 2024
February 1, 2024
July 24, 2023
May 30, 2023
October 28, 2022