Multimodal Variational AutoEncoders
Multimodal Variational Autoencoders (VAEs) are generative models designed to learn joint representations from data encompassing multiple modalities (e.g., images, text, sensor readings). Current research emphasizes improving the handling of complex inter-modal relationships, often through advanced architectures like Markov Random Fields or incorporating contrastive learning and normalizing flows to enhance generative capabilities and disentangle shared and private latent factors. These advancements are driving progress in diverse applications, including cross-modal retrieval, robotic manipulation, and medical diagnosis, by enabling more effective data integration and improved model interpretability.
Papers
October 15, 2024
August 18, 2024
August 14, 2024
August 11, 2024
April 2, 2024
March 20, 2024
March 10, 2024
July 25, 2023
May 25, 2023
May 19, 2023
September 7, 2022
April 11, 2022