Audio Visual Generation
Audio-visual generation focuses on creating synchronized audio and video content, aiming to produce realistic and semantically aligned multimedia. Current research emphasizes diffusion models, often incorporating transformer architectures or leveraging pre-trained models for efficiency, with a focus on improving temporal alignment and cross-modal consistency through techniques like network bending and shared latent spaces. This field is significant for its potential applications in film production, video game development, and virtual reality, as well as for advancing our understanding of multimodal representation learning and generation.
Papers
July 10, 2024
June 28, 2024
June 11, 2024
May 28, 2024
May 23, 2024
February 27, 2024
January 9, 2024