Vision Model
Vision models are artificial intelligence systems designed to interpret and understand visual information, aiming to replicate aspects of human visual perception and reasoning. Current research emphasizes improving efficiency and generalization across diverse tasks, focusing on architectures like Vision Transformers and Convolutional Neural Networks, often incorporating large language models for multimodal understanding and instruction following. This field is crucial for advancing various applications, from medical image analysis and robotic manipulation to enhancing accessibility and creative tools, with ongoing efforts to improve model robustness, explainability, and alignment with human perception.
Papers
Context-Adaptive Deep Neural Networks via Bridge-Mode Connectivity
Nathan Drenkow, Alvin Tan, Chace Ashcraft, Kiran Karra
Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation
Jiangyong Huang, William Yicheng Zhu, Baoxiong Jia, Zan Wang, Xiaojian Ma, Qing Li, Siyuan Huang
Could Giant Pretrained Image Models Extract Universal Representations?
Yutong Lin, Ze Liu, Zheng Zhang, Han Hu, Nanning Zheng, Stephen Lin, Yue Cao
ImageNet-X: Understanding Model Mistakes with Factor of Variation Annotations
Badr Youbi Idrissi, Diane Bouchacourt, Randall Balestriero, Ivan Evtimov, Caner Hazirbas, Nicolas Ballas, Pascal Vincent, Michal Drozdzal, David Lopez-Paz, Mark Ibrahim