Pre Trained Vision Transformer
Pre-trained Vision Transformers (ViTs) leverage the power of large-scale pre-training to efficiently adapt to diverse downstream computer vision tasks, focusing on parameter and computational efficiency. Current research emphasizes methods like low-rank adaptation, visual prompt tuning, and multiple-exit strategies to minimize the number of trainable parameters and optimize inference speed, particularly for resource-constrained environments like onboard satellite processing and mobile devices. These advancements are significantly impacting various fields, including remote sensing, medical image analysis, and robotics, by enabling high-accuracy visual recognition with reduced computational demands.
Papers
ClipFormer: Key-Value Clipping of Transformers on Memristive Crossbars for Write Noise Mitigation
Abhiroop Bhattacharjee, Abhishek Moitra, Priyadarshini Panda
Learning Semantic Proxies from Visual Prompts for Parameter-Efficient Fine-Tuning in Deep Metric Learning
Li Ren, Chen Chen, Liqiang Wang, Kien Hua