Vision Language Prompt
Vision-language prompting enhances the capabilities of pre-trained vision-language models (VLMs) by learning task-specific prompts, rather than relying solely on fixed prompts or extensive fine-tuning. Current research focuses on improving the generalization ability of these prompts across unseen domains and classes, often employing techniques like prompt tuning, multi-modal prompt learning, and Bayesian modeling to mitigate overfitting and enhance interpretability. This approach offers a data-efficient and parameter-efficient way to adapt VLMs to various downstream tasks, impacting fields like image recognition, robot task planning, and multimodal emotion recognition.
Papers
September 11, 2024
June 25, 2024
June 19, 2024
April 29, 2024
March 31, 2024
March 10, 2024
January 31, 2024
January 9, 2024
September 14, 2023
March 9, 2023
November 21, 2022
October 9, 2022