Large Scale Multimodal
Large-scale multimodal research focuses on developing models that can understand and generate information across multiple data types (e.g., text, images, audio, video). Current efforts concentrate on creating universal embedding models, often using transformer-based architectures and contrastive learning, to handle diverse downstream tasks like visual question answering and root cause analysis. These advancements are driving progress in various fields, including improved document analysis, more effective misinformation detection, and enhanced human-computer interaction through more natural and nuanced conversational AI. The availability of large, diverse, and publicly accessible multimodal datasets is crucial to this progress.
Papers
October 13, 2024
October 7, 2024
June 12, 2024
June 8, 2024
December 28, 2023
December 21, 2023
December 11, 2023
November 16, 2023
August 9, 2023
May 15, 2023
December 18, 2022
July 29, 2022
April 21, 2022