Multimodal LLM
Multimodal Large Language Models (MLLMs) aim to integrate diverse data modalities, such as text, images, and video, into a unified framework for enhanced understanding and generation. Current research emphasizes efficient fusion of visual and textual information, often employing techniques like early fusion mechanisms and specialized adapters within transformer-based architectures, as well as exploring the use of Mixture-of-Experts (MoE) models. This field is significant due to its potential to improve various applications, including image captioning, visual question answering, and more complex tasks requiring cross-modal reasoning, while also addressing challenges like hallucinations and bias.
Papers
GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts
Junwen He, Yifan Wang, Lijun Wang, Huchuan Lu, Jun-Yan He, Chenyang Li, Hanyuan Chen, Jin-Peng Lan, Bin Luo, Yifeng Geng
VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation
Ruiyang Zhang, Hu Zhang, Zhedong Zheng
Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination
Haojie Zheng, Tianyang Xu, Hanchi Sun, Shu Pu, Ruoxi Chen, Lichao Sun
Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations
Jianfeng Chi, Ujjwal Karn, Hongyuan Zhan, Eric Smith, Javier Rando, Yiming Zhang, Kate Plawiak, Zacharie Delpierre Coudert, Kartikeya Upasani, Mahesh Pasupuleti
VMID: A Multimodal Fusion LLM Framework for Detecting and Identifying Misinformation of Short Videos
Weihao Zhong, Yinhao Xiao, Minghui Xu, Xiuzhen Cheng
Teach Multimodal LLMs to Comprehend Electrocardiographic Images
Ruoqi Liu, Yuelin Bai, Xiang Yue, Ping Zhang
DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding
Manan Suri, Puneet Mathur, Franck Dernoncourt, Rajiv Jain, Vlad I Morariu, Ramit Sawhney, Preslav Nakov, Dinesh Manocha
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages
Xiang Yue, Yueqi Song, Akari Asai, Seungone Kim, Jean de Dieu Nyandwi, Simran Khanuja, Anjali Kantharuban, Lintang Sutawika, Sathyanarayanan Ramamoorthy, Graham Neubig