Single CLIP
Single CLIP, a powerful vision-language model, is being extensively studied to improve its performance and address its limitations in various applications. Current research focuses on mitigating issues like object hallucinations, enhancing its capabilities for specialized domains (e.g., agriculture), and developing robust defenses against adversarial attacks and biases. This work is significant because it explores ways to leverage CLIP's impressive zero-shot capabilities while simultaneously improving its accuracy, reliability, and fairness across diverse downstream tasks, impacting fields ranging from image generation to anomaly detection.
Papers
CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not
Aneeshan Sain, Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Subhadeep Koley, Tao Xiang, Yi-Zhe Song
CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching
Xiaoshi Wu, Feng Zhu, Rui Zhao, Hongsheng Li
X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion
Hanqing Zhao, Dianmo Sheng, Jianmin Bao, Dongdong Chen, Dong Chen, Fang Wen, Lu Yuan, Ce Liu, Wenbo Zhou, Qi Chu, Weiming Zhang, Nenghai Yu
ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation
Ziqin Zhou, Bowen Zhang, Yinjie Lei, Lingqiao Liu, Yifan Liu