Pre Trained Vision Language Model
Pre-trained vision-language models (VLMs) integrate visual and textual information, aiming to improve multimodal understanding and enable zero-shot or few-shot learning across diverse tasks. Current research focuses on enhancing VLMs' compositional reasoning, adapting them to specialized domains (e.g., agriculture, healthcare), and improving efficiency through quantization and parameter-efficient fine-tuning techniques like prompt learning and adapter modules. These advancements are significant because they enable more robust and efficient applications of VLMs in various fields, ranging from robotics and medical image analysis to open-vocabulary object detection and long-tailed image classification.
Papers
August 13, 2024
August 9, 2024
August 7, 2024
July 30, 2024
July 29, 2024
July 22, 2024
July 21, 2024
July 19, 2024
July 15, 2024
July 9, 2024
July 3, 2024
June 18, 2024
June 13, 2024
June 8, 2024
June 5, 2024
May 29, 2024
May 24, 2024
May 14, 2024
May 11, 2024