Visual Prompt Tuning
Visual prompt tuning (VPT) is a parameter-efficient fine-tuning technique for adapting large pre-trained vision models, primarily Vision Transformers (ViTs), to new downstream tasks by learning a small set of additional parameters ("prompts") instead of retraining the entire model. Current research focuses on improving VPT's effectiveness across diverse tasks (e.g., image classification, segmentation, domain adaptation) and model architectures, exploring strategies like cross-attention mechanisms, spatial-frequency prompt design, and meta-learning for prompt selection. This approach offers significant advantages in terms of computational efficiency and reduced storage requirements, making it valuable for resource-constrained environments and facilitating broader application of large vision models.