Visual Foundation Model
Visual foundation models are large-scale, pre-trained models designed to learn generalizable visual representations from massive datasets, enabling zero-shot or few-shot adaptation to diverse downstream tasks. Current research emphasizes improving efficiency through techniques like adapter pruning and sharing, exploring novel architectures such as diffusion models for dense prediction, and integrating these models with other modalities (e.g., language, 3D data) for enhanced capabilities in areas like scene understanding and robotic control. This field is significant because it promises to advance numerous applications, including image analysis, video processing, robotics, and medical image analysis, by providing robust and adaptable visual intelligence.
Papers
Probing the 3D Awareness of Visual Foundation Models
Mohamed El Banani, Amit Raj, Kevis-Kokitsi Maninis, Abhishek Kar, Yuanzhen Li, Michael Rubinstein, Deqing Sun, Leonidas Guibas, Justin Johnson, Varun Jampani
Pathological Primitive Segmentation Based on Visual Foundation Model with Zero-Shot Mask Generation
Abu Bakor Hayat Arnob, Xiangxue Wang, Yiping Jiao, Xiao Gan, Wenlong Ming, Jun Xu