Multimodal in Context Learning
Multimodal in-context learning (M-ICL) explores how large multimodal models (LMMs) can learn new tasks from a few examples without retraining, leveraging diverse data modalities like text and images. Current research focuses on understanding the mechanisms of M-ICL, improving its efficiency through techniques like multimodal task vectors and context-aware modules, and developing better datasets and evaluation benchmarks for diverse tasks. This field is significant because it promises more efficient and adaptable AI systems, with applications ranging from medical image analysis and scene text recognition to multimodal question answering and video narration.
Papers
October 27, 2024
October 22, 2024
August 23, 2024
July 1, 2024
June 27, 2024
June 21, 2024
June 15, 2024
June 14, 2024
June 3, 2024
May 28, 2024
April 24, 2024
April 19, 2024
March 25, 2024
March 19, 2024
January 6, 2024
December 5, 2023
November 29, 2023
November 22, 2023