Vision Language Prompt

Vision-language prompting enhances the capabilities of pre-trained vision-language models (VLMs) by learning task-specific prompts, rather than relying solely on fixed prompts or extensive fine-tuning. Current research focuses on improving the generalization ability of these prompts across unseen domains and classes, often employing techniques like prompt tuning, multi-modal prompt learning, and Bayesian modeling to mitigate overfitting and enhance interpretability. This approach offers a data-efficient and parameter-efficient way to adapt VLMs to various downstream tasks, impacting fields like image recognition, robot task planning, and multimodal emotion recognition.

Papers