Open Vocabulary Detection

Open-vocabulary detection (OVD) aims to enable object detection systems to identify objects beyond those explicitly included in their training data, addressing the limitations of traditional closed-vocabulary approaches. Current research heavily utilizes vision-language models (VLMs) like CLIP, often integrated into detector architectures such as DETR and YOLO, with a focus on improving feature alignment between visual and textual representations and mitigating noise from pseudo-labeling techniques. This field is significant because it pushes the boundaries of object recognition towards more robust and generalizable systems, with potential applications in various domains requiring real-world adaptability and handling of unseen objects.

Papers