Text Contrastive Learning

Text contrastive learning aims to learn robust multimodal representations by jointly embedding images and their textual descriptions, leveraging the inherent connection between visual and linguistic information. Current research focuses on improving efficiency (e.g., through patch ranking in Vision Transformers), enhancing model performance via novel masking strategies and contrastive loss functions, and adapting the approach to diverse domains like medical imaging, remote sensing, and video analysis. This technique is significant for its ability to improve zero-shot and few-shot learning capabilities across various visual tasks, reducing the reliance on large labeled datasets and enabling applications in areas with limited annotated data.

Papers

March 31, 2023

Weakly-Supervised Text-driven Contrastive Learning for Facial Behavior Understanding
Xiang Zhang, Taoyue Wang, Xiaotian Li, Huiyuan Yang, Lijun Yin
Contrastive Learning Facial Expression Text Contrastive Learning Contrastive Self Supervision Facial Affective Behavior Analysis

February 23, 2023

Learning Visual Representations via Language-Guided Sampling
Mohamed El Banani, Karan Desai, Justin Johnson
Contrastive Learning Cross Modal Visual Representation Learning Text Contrastive Learning Language Sampling

January 5, 2023

CiT: Curation in Training for Effective Vision-Language Data
Hu Xu, Saining Xie, Po-Yao Huang, Licheng Yu, Russell Howes, Gargi Ghosh, Luke Zettlemoyer, Christoph Feichtenhofer
Training Data Large Vision Language Model Image Text Pair Automatic Curation Text Contrastive Learning

December 15, 2022

CLIPPO: Image-and-Language Understanding from Pixels Only
Michael Tschannen, Basil Mustafa, Neil Houlsby
Contrastive Loss Multimodal Model Tetromino Pixel Multimodal Task Text Contrastive Learning Clipped Stochastic Gradient Descent

December 7, 2022

SimVTP: Simple Video Text Pre-training with Masked Autoencoders
Yue Ma, Tianyu Yang, Yin Shan, Xiu Li
Cross Modal Masked Autoencoders Video Text Text Contrastive Learning

November 29, 2022

Textual Enhanced Contrastive Learning for Solving Math Word Problems
Yibin Shen, Qianying Liu, Zhuoyuan Mao, Fei Cheng, Sadao Kurohashi
Contrastive Learning Text Modality Math Word Problem Text Contrastive Learning Challenge Dataset Text Perturbation

November 23, 2022

Texts as Images in Prompt Tuning for Multi-Label Image Recognition
Zixian Guo, Bowen Dong, Zhilong Ji, Jinfeng Bai, Yiwen Guo, Wangmeng Zuo
Prompt Tuning Pre Trained Vision Language Model Text Based Text Contrastive Learning Multi Label Image Recognition

November 14, 2022

The Role of Local Alignment and Uniformity in Image-Text Contrastive Learning on Medical Images
Philip Müller, Georgios Kaissis, Daniel Rueckert
Contrastive Learning Medical Image Integral Role Contrastive Loss Uniformity Metric Text Contrastive Learning Local Alignment

October 18, 2022

MedCLIP: Contrastive Learning from Unpaired Medical Images and Text
Zifeng Wang, Zhenbang Wu, Dinesh Agarwal, Jimeng Sun
Contrastive Learning Text Modality Image Text Text Contrastive Learning Change Captioning

May 12, 2022

Simple Open-Vocabulary Object Detection with Vision Transformers
Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby
Vision Transformer Pre Training Open Vocabulary Object Detection Text Contrastive Learning