Visual Perception
Visual perception research focuses on understanding how humans and artificial systems interpret visual information, aiming to bridge the gap between raw sensory input and high-level cognitive understanding. Current research emphasizes evaluating large vision-language models (LVLMs) across multiple levels of perception, from low-level feature extraction to complex semantic reasoning, using benchmarks that assess both accuracy and the presence of hallucinations or biases. These efforts are crucial for improving the reliability and robustness of AI systems in various applications, from autonomous driving to assistive technologies for visually impaired individuals, and for advancing our understanding of human visual cognition.
Papers
Artwork Explanation in Large-scale Vision Language Models
Kazuki Hayashi, Yusuke Sakai, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe
Aligning Knowledge Graph with Visual Perception for Object-goal Navigation
Nuo Xu, Wen Wang, Rong Yang, Mengjie Qin, Zheyuan Lin, Wei Song, Chunlong Zhang, Jason Gu, Chao Li
EVP: Enhanced Visual Perception using Inverse Multi-Attentive Feature Refinement and Regularized Image-Text Alignment
Mykola Lavreniuk, Shariq Farooq Bhat, Matthias Müller, Peter Wonka
PerMod: Perceptually Grounded Voice Modification with Latent Diffusion Models
Robin Netzorg, Ajil Jalal, Luna McNulty, Gopala Krishna Anumanchipalli