Zero Shot Composed Image Retrieval

Zero-shot composed image retrieval (ZS-CIR) aims to retrieve images based on a combined query of a reference image and modifying text, without requiring training data specifically labeled for this task. Current research focuses on developing training-free or weakly-supervised methods, often employing techniques like multimodal large language models, projection modules mapping images to text embeddings, and diffusion models, to effectively fuse image and text information for accurate retrieval. This field is significant because it addresses the high cost of creating labeled datasets for traditional composed image retrieval, potentially enabling more efficient and scalable image search applications across various domains.

Papers