Open Vocabulary Detection
Open-vocabulary detection (OVD) aims to enable object detection systems to identify objects beyond those explicitly included in their training data, addressing the limitations of traditional closed-vocabulary approaches. Current research heavily utilizes vision-language models (VLMs) like CLIP, often integrated into detector architectures such as DETR and YOLO, with a focus on improving feature alignment between visual and textual representations and mitigating noise from pseudo-labeling techniques. This field is significant because it pushes the boundaries of object recognition towards more robust and generalizable systems, with potential applications in various domains requiring real-world adaptability and handling of unseen objects.
Papers
Three ways to improve feature alignment for open vocabulary detection
Relja Arandjelović, Alex Andonian, Arthur Mensch, Olivier J. Hénaff, Jean-Baptiste Alayrac, Andrew Zisserman
CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching
Xiaoshi Wu, Feng Zhu, Rui Zhao, Hongsheng Li