Multimodal Generative Model
Multimodal generative models aim to create coherent representations and generate data across multiple modalities (e.g., text, images, audio) by learning the relationships between them. Current research emphasizes improving the expressiveness of these models, often using energy-based priors or combining contrastive and reconstruction learning techniques within architectures like transformers and variational autoencoders. This field is significant for advancing artificial intelligence, enabling applications such as improved image captioning, radiology report generation, and more robust and efficient path planning in robotics, while also highlighting and mitigating biases present in training data.
Papers
November 14, 2024
October 29, 2024
October 3, 2024
September 30, 2024
July 18, 2024
June 25, 2024
June 6, 2024
May 29, 2024
May 27, 2024
February 20, 2024
January 17, 2024
November 29, 2023
October 24, 2023
October 8, 2023
June 15, 2023
June 2, 2023
April 26, 2023
December 29, 2022
October 26, 2022