Contrastive Language Image
Contrastive Language-Image Pre-training (CLIP) models aim to learn joint representations of images and text, enabling zero-shot image classification and other multimodal tasks. Current research focuses on improving CLIP's localization capabilities, robustness to various data variations (including 3D data and low-light conditions), and efficiency through techniques like knowledge distillation and mixture-of-experts architectures. These advancements are significant for enhancing the reliability and applicability of CLIP in diverse fields, including medical image analysis, robotics, and AI-generated content detection.
Papers
September 19, 2023
September 15, 2023
Biased Attention: Do Vision Transformers Amplify Gender Bias More than Convolutional Neural Networks?
Abhishek Mandal, Susan Leavy, Suzanne Little
Find What You Want: Learning Demand-conditioned Object Attribute Space for Demand-driven Navigation
Hongcheng Wang, Andy Guan Hong Chen, Xiaoqi Li, Mingdong Wu, Hao Dong
September 14, 2023
September 6, 2023
August 29, 2023
August 24, 2023
August 22, 2023
August 21, 2023
August 19, 2023
August 16, 2023
August 9, 2023
August 5, 2023
August 4, 2023
July 30, 2023
July 24, 2023
July 20, 2023
July 12, 2023
July 7, 2023
July 3, 2023