Zero Shot Open Vocabulary
Zero-shot open-vocabulary (ZSO) methods aim to enable computer vision models to recognize and process objects and scenes described by text prompts they've never encountered during training. Current research focuses on improving the alignment of visual and textual representations, often leveraging large pre-trained vision-language models (like CLIP) and incorporating techniques such as contrastive learning, diffusion models, and hierarchical comparisons to enhance performance in tasks like segmentation and tracking. These advancements are significant because they reduce the reliance on extensive labeled datasets, paving the way for more robust and adaptable computer vision systems applicable to diverse real-world scenarios.
Papers
June 23, 2024
April 11, 2024
March 29, 2024
December 14, 2023
November 15, 2023
November 1, 2023
October 10, 2023
June 15, 2023
June 1, 2023
March 30, 2023
March 23, 2023