Contrastive Language Image
Contrastive Language-Image Pre-training (CLIP) models aim to learn joint representations of images and text, enabling zero-shot image classification and other multimodal tasks. Current research focuses on improving CLIP's localization capabilities, robustness to various data variations (including 3D data and low-light conditions), and efficiency through techniques like knowledge distillation and mixture-of-experts architectures. These advancements are significant for enhancing the reliability and applicability of CLIP in diverse fields, including medical image analysis, robotics, and AI-generated content detection.
Papers
FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources
Xiyuan Wei, Fanjiang Ye, Ori Yonay, Xingyu Chen, Baixi Sun, Dingwen Tao, Tianbao Yang
SignCLIP: Connecting Text and Sign Language by Contrastive Learning
Zifan Jiang, Gerard Sant, Amit Moryossef, Mathias Müller, Rico Sennrich, Sarah Ebling