CLIP Space

CLIP space, the embedding space generated by the CLIP (Contrastive Language–Image Pre-training) model, is being actively explored for its potential in enabling flexible and efficient text-guided image manipulation. Current research focuses on leveraging CLIP's ability to bridge the gap between text and image representations to develop methods for image editing and generation, often employing diffusion models or GANs, by manipulating CLIP embeddings directly or their differences (DeltaSpace). This approach offers advantages such as text-free training and zero-shot inference capabilities, reducing the reliance on large annotated datasets and improving the efficiency and versatility of image editing tools.

Papers