Large Multimodal Model
Large multimodal models (LMMs) integrate vision and language processing capabilities to understand and generate information across multiple modalities. Current research focuses on improving LMM performance in complex tasks like temporal reasoning in videos, fine-grained image understanding, and robust handling of diverse data types, often leveraging architectures based on instruction tuning and contrastive learning. These advancements are significant for various applications, including improved intelligent tutoring systems, advanced robotics, and more accurate medical diagnoses, by enabling more sophisticated analysis and interaction with the world.
Papers
July 11, 2023
July 3, 2023
June 28, 2023
June 26, 2023
June 21, 2023
June 15, 2023
June 13, 2023
June 8, 2023
May 14, 2023
May 13, 2023
April 15, 2023
April 12, 2023
January 7, 2023
May 27, 2022
March 28, 2022