Large Multi Modal Model
Large multi-modal models (LMMs) integrate multiple data modalities, such as text and images or video, to perform complex tasks like visual question answering and image captioning. Current research emphasizes improving LMM efficiency through techniques like visual context compression and specialized architectures such as mixtures of experts, while also addressing challenges such as hallucinations and robustness to noisy or incomplete data. These advancements are significant because they enable more powerful and versatile AI systems with applications ranging from assistive technologies for the visually impaired to advanced robotics and medical diagnosis.
Papers
January 30, 2024
January 18, 2024
December 22, 2023
December 16, 2023
December 7, 2023
December 5, 2023
December 1, 2023
November 29, 2023
November 21, 2023
November 11, 2023
September 29, 2023
September 4, 2023
June 26, 2023
October 16, 2022