Multimodal Generative Model
Multimodal generative models aim to create coherent representations and generate data across multiple modalities (e.g., text, images, audio) by learning the relationships between them. Current research emphasizes improving the expressiveness of these models, often using energy-based priors or combining contrastive and reconstruction learning techniques within architectures like transformers and variational autoencoders. This field is significant for advancing artificial intelligence, enabling applications such as improved image captioning, radiology report generation, and more robust and efficient path planning in robotics, while also highlighting and mitigating biases present in training data.
Papers
October 13, 2022
September 7, 2022
August 3, 2022
July 5, 2022
June 29, 2022
May 25, 2022