Language Image Pre Training
Language-image pre-training (LIP) aims to learn joint representations of images and their textual descriptions, enabling powerful zero-shot capabilities in various downstream tasks. Current research focuses on improving efficiency (e.g., through token pruning and merging, optimized loss functions like sigmoid loss), enhancing data utilization (e.g., using multi-perspective supervision and long captions), and addressing noisy or incomplete data. These advancements lead to more accurate and efficient models for applications such as image classification, retrieval, and semantic segmentation, impacting both computer vision and natural language processing research.
Papers
December 19, 2024
December 16, 2024
November 30, 2024
November 18, 2024
September 19, 2024
June 3, 2024
April 30, 2024
March 25, 2024
March 23, 2024
February 9, 2024
September 28, 2023
August 16, 2023
May 15, 2023
March 27, 2023
December 14, 2022