Visual Prompting
Visual prompting is a parameter-efficient transfer learning technique that enhances the capabilities of large vision-language models (LVLMs) by providing visual cues, such as bounding boxes or attention heatmaps, alongside textual prompts. Current research focuses on developing methods for generating effective visual prompts, integrating them with various LVLMs (e.g., using CLIP for prompt generation or incorporating prompts into the attention mechanisms of vision transformers), and applying these techniques to diverse tasks including object tracking, image segmentation, and question answering. This approach offers a powerful way to adapt pre-trained models to new tasks and domains with minimal computational cost, impacting fields like medical image analysis, robotics, and remote sensing through improved accuracy and efficiency.
Papers
Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach
Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal, James J. Little
Exploring the Transferability of Visual Prompting for Multimodal Large Language Models
Yichi Zhang, Yinpeng Dong, Siyuan Zhang, Tianzan Min, Hang Su, Jun Zhu