Single CLIP
Single CLIP, a powerful vision-language model, is being extensively studied to improve its performance and address its limitations in various applications. Current research focuses on mitigating issues like object hallucinations, enhancing its capabilities for specialized domains (e.g., agriculture), and developing robust defenses against adversarial attacks and biases. This work is significant because it explores ways to leverage CLIP's impressive zero-shot capabilities while simultaneously improving its accuracy, reliability, and fairness across diverse downstream tasks, impacting fields ranging from image generation to anomaly detection.
Papers
UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities
Muhammad Uzair Khattak, Shahina Kunhimon, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan
Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP
Yating Yu, Congqi Cao, Yueran Zhang, Qinyi Lv, Lingtong Min, Yanning Zhang
Enhancing Robustness of CLIP to Common Corruptions through Bimodal Test-Time Adaptation
Sarthak Kumar Maharana, Baoming Zhang, Leonid Karlinsky, Rogerio Feris, Yunhui Guo
Viewpoint Consistency in 3D Generation via Attention and CLIP Guidance
Qing Zhang, Zehao Chen, Jinguang Tong, Jing Zhang, Jie Hong, Xuesong Li