Visual Perception
Visual perception research focuses on understanding how humans and artificial systems interpret visual information, aiming to bridge the gap between raw sensory input and high-level cognitive understanding. Current research emphasizes evaluating large vision-language models (LVLMs) across multiple levels of perception, from low-level feature extraction to complex semantic reasoning, using benchmarks that assess both accuracy and the presence of hallucinations or biases. These efforts are crucial for improving the reliability and robustness of AI systems in various applications, from autonomous driving to assistive technologies for visually impaired individuals, and for advancing our understanding of human visual cognition.
Papers
VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information
Ryo Kamoi, Yusen Zhang, Sarkar Snigdha Sarathi Das, Ranran Haoran Zhang, Rui Zhang
LVLM-COUNT: Enhancing the Counting Ability of Large Vision-Language Models
Muhammad Fetrat Qharabagh, Mohammadreza Ghofrani, Kimon Fountoulakis
Understanding Graphical Perception in Data Visualization through Zero-shot Prompting of Vision-Language Models
Grace Guo, Jenna Jiayi Kang, Raj Sanjay Shah, Hanspeter Pfister, Sashank Varma
Context-Aware Token Selection and Packing for Enhanced Vision Transformer
Tianyi Zhang, Baoxin Li, Jae-sun Seo, Yu Cap
VHELM: A Holistic Evaluation of Vision Language Models
Tony Lee, Haoqin Tu, Chi Heem Wong, Wenhao Zheng, Yiyang Zhou, Yifan Mai, Josselin Somerville Roberts, Michihiro Yasunaga, Huaxiu Yao, Cihang Xie, Percy Liang
Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models
Yubo Wang, Chaohu Liu, Yanqiu Qu, Haoyu Cao, Deqiang Jiang, Linli Xu