Vision Language Model
Vision-language models (VLMs) integrate visual and textual information to perform complex tasks, aiming to bridge the gap between computer vision and natural language processing. Current research focuses on improving VLM efficiency and robustness through techniques like prompt tuning, which optimizes textual or visual prompts for specific tasks, and sparse token optimization to reduce computational overhead. These advancements are significant because they enable VLMs to be applied to diverse real-world applications, including robotics, autonomous driving, medical image analysis, and fake news detection, while addressing challenges like hallucinations and model miscalibration.
Papers
Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis
Prateek Verma, Minh-Hao Van, Xintao Wu
CLIPArTT: Adaptation of CLIP to New Domains at Test Time
Gustavo Adolfo Vargas Hakim, David Osowiechi, Mehrdad Noori, Milad Cheraghalikhani, Ali Bahri, Moslem Yazdanpanah, Ismail Ben Ayed, Christian Desrosiers
GUing: A Mobile GUI Search Engine using a Vision-Language Model
Jialiang Wei, Anne-Lise Courbis, Thomas Lambolais, Binbin Xu, Pierre Louis Bernard, Gérard Dray, Walid Maalej
Revisiting the Adversarial Robustness of Vision Language Models: a Multimodal Perspective
Wanqi Zhou, Shuanghao Bai, Danilo P. Mandic, Qibin Zhao, Badong Chen
Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM
Navid Rajabi, Jana Kosecka
In-Context Symbolic Regression: Leveraging Large Language Models for Function Discovery
Matteo Merler, Katsiaryna Haitsiukevich, Nicola Dainese, Pekka Marttinen
Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models
Hongyi Zhu, Jia-Hong Huang, Stevan Rudinac, Evangelos Kanoulas
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Letitia Parcalabescu, Anette Frank
SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models
Manav Nitin Kapadnis, Sohan Patnaik, Abhilash Nandy, Sourjyadip Ray, Pawan Goyal, Debdoot Sheet
Medical Vision-Language Pre-Training for Brain Abnormalities
Masoud Monajatipoor, Zi-Yi Dou, Aichi Chien, Nanyun Peng, Kai-Wei Chang