Large Vision Language Model
Large Vision-Language Models (LVLMs) integrate computer vision and natural language processing to enable machines to understand and reason about images and text simultaneously. Current research focuses on improving LVLMs' accuracy, efficiency, and robustness, particularly addressing issues like hallucinations (generating inaccurate information), and enhancing their ability to perform multi-level visual perception and reasoning tasks, including quantitative spatial reasoning and mechanical understanding. These advancements are significant for various applications, including medical image analysis, robotics, and autonomous driving, by enabling more reliable and insightful multimodal data processing.
Papers
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
Chen Ju, Haicheng Wang, Haozhe Cheng, Xu Chen, Zhonghua Zhai, Weilin Huang, Jinsong Lan, Shuai Xiao, Bo Zheng
Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
Jinrui Zhang, Teng Wang, Haigang Zhang, Ping Lu, Feng Zheng
LiteGPT: Large Vision-Language Model for Joint Chest X-ray Localization and Classification Task
Khai Le-Duc, Ryan Zhang, Ngoc Son Nguyen, Tan-Hanh Pham, Anh Dao, Ba Hung Ngo, Anh Totti Nguyen, Truong-Son Hy
Large Vision-Language Models as Emotion Recognizers in Context Awareness
Yuxuan Lei, Dingkang Yang, Zhaoyu Chen, Jiawei Chen, Peng Zhai, Lihua Zhang
CXR-Agent: Vision-language models for chest X-ray interpretation with uncertainty aware radiology reporting
Naman Sharma
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Runhui Huang, Xinpeng Ding, Chunwei Wang, Jianhua Han, Yulong Liu, Hengshuang Zhao, Hang Xu, Lu Hou, Wei Zhang, Xiaodan Liang
IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
Yatai Ji, Shilong Zhang, Jie Wu, Peize Sun, Weifeng Chen, Xuefeng Xiao, Sidi Yang, Yujiu Yang, Ping Luo
A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends
Daizong Liu, Mingyu Yang, Xiaoye Qu, Pan Zhou, Yu Cheng, Wei Hu
CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging
Raza Imam, Mohammed Talha Alam, Umaima Rahman, Mohsen Guizani, Fakhri Karray