Pre Trained Vision Transformer
Pre-trained Vision Transformers (ViTs) leverage the power of large-scale pre-training to efficiently adapt to diverse downstream computer vision tasks, focusing on parameter and computational efficiency. Current research emphasizes methods like low-rank adaptation, visual prompt tuning, and multiple-exit strategies to minimize the number of trainable parameters and optimize inference speed, particularly for resource-constrained environments like onboard satellite processing and mobile devices. These advancements are significantly impacting various fields, including remote sensing, medical image analysis, and robotics, by enabling high-accuracy visual recognition with reduced computational demands.
Papers
ClipFormer: Key-Value Clipping of Transformers on Memristive Crossbars for Write Noise Mitigation
Abhiroop Bhattacharjee, Abhishek Moitra, Priyadarshini Panda
Learning Semantic Proxies from Visual Prompts for Parameter-Efficient Fine-Tuning in Deep Metric Learning
Li Ren, Chen Chen, Liqiang Wang, Kien Hua
Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated Learning
Wenlong Deng, Christos Thrampoulidis, Xiaoxiao Li
Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare
Junling Liu, Ziming Wang, Qichen Ye, Dading Chong, Peilin Zhou, Yining Hua