Large Vision Language Model
Large Vision-Language Models (LVLMs) integrate computer vision and natural language processing to enable machines to understand and reason about images and text simultaneously. Current research focuses on improving LVLMs' accuracy, efficiency, and robustness, particularly addressing issues like hallucinations (generating inaccurate information), and enhancing their ability to perform multi-level visual perception and reasoning tasks, including quantitative spatial reasoning and mechanical understanding. These advancements are significant for various applications, including medical image analysis, robotics, and autonomous driving, by enabling more reliable and insightful multimodal data processing.
Papers
ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models
Yi-Lin Sung, Jaehong Yoon, Mohit Bansal
ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Zejun Li, Ye Wang, Mengfei Du, Qingwen Liu, Binhao Wu, Jiwen Zhang, Chengxing Zhou, Zhihao Fan, Jie Fu, Jingjing Chen, Xuanjing Huang, Zhongyu Wei
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning
Qiao Gu, Alihusein Kuwajerwala, Sacha Morin, Krishna Murthy Jatavallabhula, Bipasha Sen, Aditya Agarwal, Corban Rivera, William Paul, Kirsty Ellis, Rama Chellappa, Chuang Gan, Celso Miguel de Melo, Joshua B. Tenenbaum, Antonio Torralba, Florian Shkurti, Liam Paull
QwenGrasp: A Usage of Large Vision-Language Model for Target-Oriented Grasping
Xinyu Chen, Jian Yang, Zonghan He, Haobin Yang, Qi Zhao, Yuhui Shi
Evaluation and Enhancement of Semantic Grounding in Large Vision-Language Models
Jiaying Lu, Jinmeng Rao, Kezhen Chen, Xiaoyuan Guo, Yawen Zhang, Baochen Sun, Carl Yang, Jie Yang
Efficient Adaptive Human-Object Interaction Detection with Concept-guided Memory
Ting Lei, Fabian Caba, Qingchao Chen, Hailin Jin, Yuxin Peng, Yang Liu
AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models
Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Ming Tang, Jinqiao Wang
Evaluation and Analysis of Hallucination in Large Vision-Language Models
Junyang Wang, Yiyang Zhou, Guohai Xu, Pengcheng Shi, Chenlin Zhao, Haiyang Xu, Qinghao Ye, Ming Yan, Ji Zhang, Jihua Zhu, Jitao Sang, Haoyu Tang