Single CLIP
Single CLIP, a powerful vision-language model, is being extensively studied to improve its performance and address its limitations in various applications. Current research focuses on mitigating issues like object hallucinations, enhancing its capabilities for specialized domains (e.g., agriculture), and developing robust defenses against adversarial attacks and biases. This work is significant because it explores ways to leverage CLIP's impressive zero-shot capabilities while simultaneously improving its accuracy, reliability, and fairness across diverse downstream tasks, impacting fields ranging from image generation to anomaly detection.
Papers
To Clip or not to Clip: the Dynamics of SGD with Gradient Clipping in High-Dimensions
Noah Marshall, Ke Liang Xiao, Atish Agarwala, Elliot Paquette
Mining Open Semantics from CLIP: A Relation Transition Perspective for Few-Shot Learning
Cilin Yan, Haochen Wang, Xiaolong Jiang, Yao Hu, Xu Tang, Guoliang Kang, Efstratios Gavves