Pre Trained Vision Transformer
Pre-trained Vision Transformers (ViTs) leverage the power of large-scale pre-training to efficiently adapt to diverse downstream computer vision tasks, focusing on parameter and computational efficiency. Current research emphasizes methods like low-rank adaptation, visual prompt tuning, and multiple-exit strategies to minimize the number of trainable parameters and optimize inference speed, particularly for resource-constrained environments like onboard satellite processing and mobile devices. These advancements are significantly impacting various fields, including remote sensing, medical image analysis, and robotics, by enabling high-accuracy visual recognition with reduced computational demands.
Papers
Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated Learning
Wenlong Deng, Christos Thrampoulidis, Xiaoxiao Li
Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare
Junling Liu, Ziming Wang, Qichen Ye, Dading Chong, Peilin Zhou, Yining Hua
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
Henry Hengyuan Zhao, Pichao Wang, Yuyang Zhao, Hao Luo, Fan Wang, Mike Zheng Shou
AnyOKP: One-Shot and Instance-Aware Object Keypoint Extraction with Pretrained ViT
Fangbo Qin, Taogang Hou, Shan Lin, Kaiyuan Wang, Michael C. Yip, Shan Yu