Contrastive Language Image
Contrastive Language-Image Pre-training (CLIP) models aim to learn joint representations of images and text, enabling zero-shot image classification and other multimodal tasks. Current research focuses on improving CLIP's localization capabilities, robustness to various data variations (including 3D data and low-light conditions), and efficiency through techniques like knowledge distillation and mixture-of-experts architectures. These advancements are significant for enhancing the reliability and applicability of CLIP in diverse fields, including medical image analysis, robotics, and AI-generated content detection.
Papers
PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents
Weixiong Lin, Ziheng Zhao, Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Yanfeng Wang, Weidi Xie
Robust Contrastive Language-Image Pre-training against Data Poisoning and Backdoor Attacks
Wenhan Yang, Jingdong Gao, Baharan Mirzasoleiman
IPA-CLIP: Integrating Phonetic Priors into Vision and Language Pretraining
Chihaya Matsuhira, Marc A. Kastner, Takahiro Komamizu, Takatsugu Hirayama, Keisuke Doman, Yasutomo Kawanishi, Ichiro Ide
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention
Shijie Geng, Jianbo Yuan, Yu Tian, Yuxiao Chen, Yongfeng Zhang