Multimodal Large Language Model
Multimodal large language models (MLLMs) integrate multiple data modalities, such as text, images, and audio, to enhance understanding and reasoning capabilities beyond those of unimodal models. Current research emphasizes improving MLLM performance through refined architectures (e.g., incorporating visual grounding, chain-of-thought prompting), mitigating biases and hallucinations, and developing robust evaluation benchmarks that assess various aspects of multimodal understanding, including active perception and complex reasoning tasks. This work is significant because it pushes the boundaries of AI capabilities, leading to advancements in diverse applications like medical diagnosis, financial analysis, and robotic manipulation.
Papers
ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2
Wenjun Huang, Jianguo Hu
Multimodal Large Language Models for Bioimage Analysis
Shanghang Zhang, Gaole Dai, Tiejun Huang, Jianxu Chen
Revolutionizing Urban Safety Perception Assessments: Integrating Multimodal Large Language Models with Street View Images
Jiaxin Zhang, Yunqin Li, Tomohiro Fukuda, Bowen Wang
AgEval: A Benchmark for Zero-Shot and Few-Shot Plant Stress Phenotyping with Multimodal LLMs
Muhammad Arbab Arshad, Talukder Zaki Jubery, Tirtho Roy, Rim Nassiri, Asheesh K. Singh, Arti Singh, Chinmay Hegde, Baskar Ganapathysubramanian, Aditya Balu, Adarsh Krishnamurthy, Soumik Sarkar
Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models
Baao Xie, Qiuyu Chen, Yunnan Wang, Zequn Zhang, Xin Jin, Wenjun Zeng
Every Part Matters: Integrity Verification of Scientific Figures Based on Multimodal Large Language Models
Xiang Shi, Jiawei Liu, Yinpeng Liu, Qikai Cheng, Wei Lu
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic
Fakhraddin Alwajih, Gagan Bhatia, Muhammad Abdul-Mageed
RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models
Haoyu Chen, Wenbo Li, Jinjin Gu, Jingjing Ren, Sixiang Chen, Tian Ye, Renjing Pei, Kaiwen Zhou, Fenglong Song, Lei Zhu
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
Yiwei Ma, Zhibin Wang, Xiaoshuai Sun, Weihuang Lin, Qiang Zhou, Jiayi Ji, Rongrong Ji
UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language Models
Liu Qi, He Yongyi, Lian Defu, Zheng Zhi, Xu Tong, Liu Che, Chen Enhong
MIBench: Evaluating Multimodal Large Language Models over Multiple Images
Haowei Liu, Xi Zhang, Haiyang Xu, Yaya Shi, Chaoya Jiang, Ming Yan, Ji Zhang, Fei Huang, Chunfeng Yuan, Bing Li, Weiming Hu
Text-Augmented Multimodal LLMs for Chemical Reaction Condition Recommendation
Yu Zhang, Ruijie Yu, Kaipeng Zeng, Ding Li, Feng Zhu, Xiaokang Yang, Yaohui Jin, Yanyan Xu