Visual Perception
Visual perception research focuses on understanding how humans and artificial systems interpret visual information, aiming to bridge the gap between raw sensory input and high-level cognitive understanding. Current research emphasizes evaluating large vision-language models (LVLMs) across multiple levels of perception, from low-level feature extraction to complex semantic reasoning, using benchmarks that assess both accuracy and the presence of hallucinations or biases. These efforts are crucial for improving the reliability and robustness of AI systems in various applications, from autonomous driving to assistive technologies for visually impaired individuals, and for advancing our understanding of human visual cognition.
Papers
Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs
Sreyan Ghosh, Chandra Kiran Reddy Evuru, Sonal Kumar, Utkarsh Tyagi, Oriol Nieto, Zeyu Jin, Dinesh Manocha
Brain3D: Generating 3D Objects from fMRI
Yuankun Yang, Li Zhang, Ziyang Xie, Zhiyuan Yuan, Jianfeng Feng, Xiatian Zhu, Yu-Gang Jiang
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception
Run Luo, Yunshui Li, Longze Chen, Wanwei He, Ting-En Lin, Ziqiang Liu, Lei Zhang, Zikai Song, Xiaobo Xia, Tongliang Liu, Min Yang, Binyuan Hui
Artwork Explanation in Large-scale Vision Language Models
Kazuki Hayashi, Yusuke Sakai, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe
Aligning Knowledge Graph with Visual Perception for Object-goal Navigation
Nuo Xu, Wen Wang, Rong Yang, Mengjie Qin, Zheyuan Lin, Wei Song, Chunlong Zhang, Jason Gu, Chao Li