Compositional Zero Shot Learning

Compositional zero-shot learning (CZSL) aims to enable models to recognize novel combinations of visual features (e.g., "striped shirt") based on knowledge learned from seen combinations, without requiring explicit training examples for each new composition. Current research heavily utilizes large pre-trained vision-language models, often incorporating techniques like soft prompting, attention mechanisms, and disentangled feature learning to improve the generalization to unseen compositions. This field is significant because it addresses a key limitation of traditional zero-shot learning, paving the way for more robust and adaptable AI systems capable of handling real-world visual complexity in applications such as robotics and human-computer interaction.

Papers