Contrastive Language Image

Contrastive Language-Image Pre-training (CLIP) models aim to learn joint representations of images and text, enabling zero-shot image classification and other multimodal tasks. Current research focuses on improving CLIP's localization capabilities, robustness to various data variations (including 3D data and low-light conditions), and efficiency through techniques like knowledge distillation and mixture-of-experts architectures. These advancements are significant for enhancing the reliability and applicability of CLIP in diverse fields, including medical image analysis, robotics, and AI-generated content detection.

Papers

October 12, 2024

CLIP-SCGI: Synthesized Caption-Guided Inversion for Person Re-Identification
Qianru Han, Xinwei He, Zhi Liu, Sannyuya Liu, Ying Zhang, Jinhai Xiang
Vision Language Model Person Re Identification Contrastive Language Image Person Name Critical Synthesis Whistleblower Re Identification Image Captioning Model

October 11, 2024

On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning
Bokun Wang, Yunwen Lei, Yiming Ying, Tianbao Yang
Self Supervised Representation Learning Contrastive Loss Contrastive Language Image Self Supervised Representation Learning Discriminative Model

October 8, 2024

FACMIC: Federated Adaptative CLIP Model for Medical Image Classification
Yihang Wu, Christian Desrosiers, Ahmad Chaddad
Contrastive Language Image Medical Image Classification Attention Module Medical Image Data Training Deep

October 3, 2024

Contrastive Localized Language-Image Pre-Training
Hong-You Chen, Zhengfeng Lai, Haotian Zhang, Xinze Wang, Marcin Eichner, Keen You, Meng Cao, Bowen Zhang, Yinfei Yang, Zhe Gan
Pseudo Label Contrastive Language Image Contrastive Example Region Level Captioning

October 2, 2024

Toward a Holistic Evaluation of Robustness in CLIP Models
Weijie Tu, Weijian Deng, Tom Gedeon
Native Robustness Vision Language Model Contrastive Loss Large Multimodal Model Contrastive Language Image Holistic Evaluation

October 1, 2024

VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models
Jiapeng Wang, Chengyu Wang, Kunzhe Huang, Jun Huang, Lianwen Jin
Contrastive Language Image Retrieval Benchmark Long Text Understanding Article Headline Pair

September 28, 2024

CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling
Jihai Zhang, Xiaoye Qu, Tong Zhu, Yu Cheng
Expert Knowledge Contrastive Language Image Multimodal AI Affinity Diversification

September 26, 2024

September 23, 2024

TSCLIP: Robust CLIP Fine-Tuning for Worldwide Cross-Regional Traffic Sign Recognition
Guoyang Zhao, Fulong Ma, Weiqing Qi, Chenguang Zhang, Yuxuan Liu, Ming Liu, Jun Ma
Contrastive Language Image World Event Traffic Sign Recognition Traffic Sign

September 20, 2024

DAP-LED: Learning Degradation-Aware Priors with CLIP for Joint Low-light Enhancement and Deblurring
Ling Wang, Chen Wu, Lin Wang
Contrastive Language Image Video Deblurring Motion Blur Joint Framework Low Light Enhancement Degradation Representation

September 19, 2024

September 15, 2024

MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection
Yaning Zhang, Tianyi Wang, Zitong Yu, Zan Gao, Linlin Shen, Shengyong Chen
Fine Grained Contrastive Language Image Face Forgery Detection Forgery Localization

September 10, 2024

Revisiting Prompt Pretraining of Vision-Language Models
Zhenyuan Chen, Lingfeng Yang, Shuo Chen, Zhaowei Chen, Jiajun Liang, Xiang Li
Vision Language Model Contrastive Language Image Prompt Token

August 18, 2024

CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination
Kaicheng Yang, Tiancheng Gu, Xiang An, Haiqiang Jiang, Xiangzi Dai, Ziyong Feng, Weidong Cai, Jiankang Deng
Knowledge Distillation Contrastive Language Image Image Text Pair Instance Discrimination

August 9, 2024

ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
Mengcheng Lan, Chaofeng Chen, Yiping Ke, Xinjiang Wang, Litong Feng, Wayne Zhang
Semantic Segmentation Visual Representation Contrastive Language Image Open Vocabulary Semantic Segmentation Open Vocabulary Segmentation Transparent Proxy Server Architecture Consistent Visual Attention

August 8, 2024

ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive Language-Image Pre-traning Model
Yifan Chen, Xiaozhen Qiao, Zhe Sun, Xuelong Li
Knowledge Distillation Full Model Contrastive Language Image Image Fusion

July 30, 2024

CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning
Yuexi Du, Brian Chang, Nicha C. Dvornek
Language Model Contrastive Learning Contrastive Language Image Prompt Based Fine Tuning Efficient Large Language Model CLEFT Lip

July 29, 2024

Diffusion Feedback Helps CLIP See Better
Wenxuan Wang, Quan Sun, Fan Zhang, Yepeng Tang, Jing Liu, Xinlong Wang
Text to Image Diffusion Model Contrastive Language Image Image Text Pair Multimodal Understanding Diffusion Control