Pre Trained Vision Language Model
Pre-trained vision-language models (VLMs) integrate visual and textual information, aiming to improve multimodal understanding and enable zero-shot or few-shot learning across diverse tasks. Current research focuses on enhancing VLMs' compositional reasoning, adapting them to specialized domains (e.g., agriculture, healthcare), and improving efficiency through quantization and parameter-efficient fine-tuning techniques like prompt learning and adapter modules. These advancements are significant because they enable more robust and efficient applications of VLMs in various fields, ranging from robotics and medical image analysis to open-vocabulary object detection and long-tailed image classification.
Papers
December 31, 2022
December 13, 2022
November 28, 2022
November 23, 2022
November 17, 2022
November 9, 2022
October 9, 2022
September 10, 2022
September 7, 2022
August 29, 2022
August 16, 2022
August 4, 2022
July 1, 2022
June 22, 2022
June 6, 2022
May 30, 2022
May 6, 2022
April 29, 2022
March 28, 2022