Zero Shot Multi Label

Zero-shot multi-label classification aims to categorize data (images or text) into multiple, unseen categories without requiring any labeled training examples for those categories. Current research heavily utilizes pre-trained vision-language models like CLIP, often incorporating techniques like prompt engineering, contrastive learning, and multimodal representation alignment to bridge the gap between visual and textual information. This field is significant because it enables efficient classification in scenarios with limited or no labeled data, impacting applications ranging from medical image analysis to large-scale image and text categorization where manually labeling all possible categories is impractical.

Papers