Large Multimodal Model
Large multimodal models (LMMs) integrate vision and language processing capabilities to understand and generate information across multiple modalities. Current research focuses on improving LMM performance in complex tasks like temporal reasoning in videos, fine-grained image understanding, and robust handling of diverse data types, often leveraging architectures based on instruction tuning and contrastive learning. These advancements are significant for various applications, including improved intelligent tutoring systems, advanced robotics, and more accurate medical diagnoses, by enabling more sophisticated analysis and interaction with the world.
Papers
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Ruyi Xu, Yuan Yao, Zonghao Guo, Junbo Cui, Zanlin Ni, Chunjiang Ge, Tat-Seng Chua, Zhiyuan Liu, Maosong Sun, Gao Huang
X-LLaVA: Optimizing Bilingual Large Vision-Language Alignment
Dongjae Shin, Hyeonseok Lim, Inho Won, Changsu Choi, Minjun Kim, Seungwoo Song, Hangyeol Yoo, Sangmin Kim, Kyungtae Lim
PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large Multimodal Models
Dingkun Guo, Yuqi Xiang, Shuqi Zhao, Xinghao Zhu, Masayoshi Tomizuka, Mingyu Ding, Wei Zhan
MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model
Chunyi Li, Guo Lu, Donghui Feng, Haoning Wu, Zicheng Zhang, Xiaohong Liu, Guangtao Zhai, Weisi Lin, Wenjun Zhang
Automated Floodwater Depth Estimation Using Large Multimodal Model for Rapid Flood Mapping
Temitope Akinboyewa, Huan Ning, M. Naser Lessani, Zhenlong Li
PALO: A Polyglot Large Multimodal Model for 5B People
Muhammad Maaz, Hanoona Rasheed, Abdelrahman Shaker, Salman Khan, Hisham Cholakal, Rao M. Anwer, Tim Baldwin, Michael Felsberg, Fahad S. Khan
Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset
Ke Wang, Junting Pan, Weikang Shi, Zimu Lu, Mingjie Zhan, Hongsheng Li
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective
Zihao Yue, Liang Zhang, Qin Jin
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Baichuan Zhou, Ying Hu, Xi Weng, Junlong Jia, Jie Luo, Xien Liu, Ji Wu, Lei Huang