Text to Audio
Text-to-audio (TTA) generation aims to synthesize realistic audio from textual descriptions, a task crucial for various applications. Current research heavily utilizes latent diffusion models, often coupled with large language models (LLMs) to improve semantic understanding and temporal consistency of the generated audio, addressing challenges like semantic misalignment and limited control over audio length and style. These advancements are improving the quality and efficiency of TTA systems, impacting fields such as media production, accessibility technologies, and creative content generation. Furthermore, research is exploring the integration of visual information (video-to-audio) to enhance synchronization and personalization.
Papers
October 23, 2024
October 4, 2024
September 23, 2024
August 13, 2024
June 7, 2024
March 12, 2024
March 8, 2024
February 10, 2024
January 2, 2024
October 7, 2023
September 24, 2023
September 19, 2023
September 14, 2023
August 29, 2023
June 28, 2023
May 29, 2023
May 22, 2023
April 24, 2023
March 10, 2023