Multimodal Generative Model
Multimodal generative models aim to create coherent representations and generate data across multiple modalities (e.g., text, images, audio) by learning the relationships between them. Current research emphasizes improving the expressiveness of these models, often using energy-based priors or combining contrastive and reconstruction learning techniques within architectures like transformers and variational autoencoders. This field is significant for advancing artificial intelligence, enabling applications such as improved image captioning, radiology report generation, and more robust and efficient path planning in robotics, while also highlighting and mitigating biases present in training data.
Papers
October 24, 2023
October 8, 2023
June 15, 2023
June 2, 2023
April 26, 2023
December 29, 2022
October 26, 2022
October 18, 2022
October 13, 2022
September 7, 2022
August 3, 2022
July 5, 2022
June 29, 2022
May 25, 2022