Audio to Image Generation
Audio-to-image generation focuses on creating visual representations from audio input, aiming to bridge the gap between these distinct modalities. Current research emphasizes efficient model architectures, such as diffusion models and transformers, often leveraging pre-trained models like CLIP and incorporating techniques like masked diffusion and classifier-free guidance to improve generation quality and speed. This field is significant for its potential applications in multimedia content creation, accessibility technologies (e.g., for visually impaired users), and enhancing the interpretability of audio data through visualization.
Papers
October 7, 2024
October 3, 2024
May 23, 2024
April 25, 2024
September 28, 2023
August 18, 2023
May 22, 2023
March 10, 2023
December 5, 2022
November 6, 2022