Pre Trained Vision Language Model
Pre-trained vision-language models (VLMs) integrate visual and textual information, aiming to improve multimodal understanding and enable zero-shot or few-shot learning across diverse tasks. Current research focuses on enhancing VLMs' compositional reasoning, adapting them to specialized domains (e.g., agriculture, healthcare), and improving efficiency through quantization and parameter-efficient fine-tuning techniques like prompt learning and adapter modules. These advancements are significant because they enable more robust and efficient applications of VLMs in various fields, ranging from robotics and medical image analysis to open-vocabulary object detection and long-tailed image classification.
Papers
May 31, 2023
May 30, 2023
May 23, 2023
May 19, 2023
May 6, 2023
May 5, 2023
May 4, 2023
April 17, 2023
April 13, 2023
April 6, 2023
April 3, 2023
March 29, 2023
March 27, 2023
March 25, 2023
March 24, 2023
March 20, 2023
March 18, 2023
March 16, 2023
March 12, 2023
March 10, 2023