MLLM Attention
Multimodal large language models (MLLMs) aim to integrate diverse data modalities (text, images, video) for enhanced understanding and reasoning capabilities. Current research focuses on improving MLLM efficiency (e.g., through adaptive cropping, efficient inference frameworks, and modular architectures like Mixture-of-Experts), addressing limitations such as hallucination and catastrophic forgetting, and developing robust evaluation methods. These advancements are significant because they enable more powerful and reliable applications in areas like robotics, medical diagnosis, and general-purpose AI, pushing the boundaries of multimodal understanding and reasoning.
Papers
August 21, 2024
August 5, 2024
August 4, 2024
July 30, 2024
July 17, 2024
June 17, 2024
June 13, 2024
June 8, 2024
June 5, 2024
April 24, 2024
March 25, 2024
March 19, 2024
March 16, 2024
March 4, 2024
February 20, 2024
August 25, 2023