Multimodal VAE
Multimodal Variational Autoencoders (VAEs) aim to learn a joint representation of data from multiple modalities (e.g., images and text), enabling tasks like generation and imputation. Current research focuses on improving the quality and coherence of generated outputs by addressing limitations in existing architectures, such as using diffusion decoders for complex modalities or incorporating Markov Random Fields to better capture inter-modal relationships. These advancements are significant because they enhance the ability of VAEs to model complex, real-world datasets, with applications ranging from neuroscience to medical image synthesis.
Papers
November 15, 2024
August 29, 2024
August 18, 2024
March 8, 2024
October 4, 2023
September 15, 2023
September 7, 2022
June 9, 2022