Large Multi Modality Model
Large multi-modality models (LMMs) integrate diverse data types, such as images and text, to perform complex tasks exceeding the capabilities of unimodal models. Current research emphasizes improving LMM performance in visual quality assessment, geolocalization, and visual grounding, often employing retrieval-augmented generation and preference-based reinforcement learning techniques. These advancements are driving progress in various fields, including image analysis, information retrieval, and AI-generated content evaluation, by enabling more nuanced and accurate understanding of multimodal data.
Papers
December 20, 2024
July 24, 2024
July 16, 2024
June 13, 2024
May 23, 2024
April 28, 2024
April 27, 2024
March 28, 2024
February 26, 2024
February 16, 2024
December 28, 2023
November 21, 2023