Visual Prompt
Visual prompting is a rapidly evolving technique that enhances the capabilities of large language and vision-language models by providing visual instructions, such as points, boxes, masks, or even entire images, alongside textual prompts. Current research focuses on improving model performance in tasks like image segmentation, object recognition, and question answering through various methods including prompt optimization, multi-representation learning, and the integration of external knowledge sources. This approach offers significant potential for improving the efficiency and accuracy of multimodal AI systems, impacting diverse fields from medical image analysis to remote sensing and creative applications like text-to-image and text-to-3D generation.
Papers
Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts
Pasquale De Marinis, Nicola Fanelli, Raffaele Scaringi, Emanuele Colonna, Giuseppe Fiameni, Gennaro Vessio, Giovanna Castellano
SADL: An Effective In-Context Learning Method for Compositional Visual QA
Long Hoang Dang, Thao Minh Le, Vuong Le, Tu Minh Phuong, Truyen Tran
Learning Visual Prompts for Guiding the Attention of Vision Transformers
Razieh Rezaei, Masoud Jalili Sabet, Jindong Gu, Daniel Rueckert, Philip Torr, Ashkan Khakzar
Attend and Enrich: Enhanced Visual Prompt for Zero-Shot Learning
Man Liu, Huihui Bai, Feng Li, Chunjie Zhang, Yunchao Wei, Tat-Seng Chua, Yao Zhao