Generative Multimodal Model
Generative multimodal models aim to create systems capable of understanding and generating data across multiple modalities, such as text, images, audio, and video, simultaneously. Current research focuses on improving model architectures like transformers and diffusion models, often incorporating techniques like in-context learning and addressing challenges such as data scarcity and bias mitigation through methods like prompt engineering and data augmentation. These advancements are significant for various applications, including medical diagnosis (e.g., Alzheimer's detection), creative content generation (e.g., text-to-video), and addressing societal biases embedded within large datasets used to train these models.
Papers
July 23, 2024
June 19, 2024
December 21, 2023
December 20, 2023
April 26, 2023
January 30, 2023