Open Vocabulary Object Detection

Open-vocabulary object detection (OVOD) aims to enable computer vision systems to identify objects using textual descriptions, even if those objects weren't seen during training. Current research focuses on improving the accuracy and efficiency of OVOD, often leveraging vision-language models (like CLIP) and transformer-based architectures (like DETR) to bridge the gap between visual and textual representations, and addressing challenges like fine-grained attribute recognition and robustness to distribution shifts. The advancements in OVOD have significant implications for various applications, including autonomous driving, robotics, and remote sensing, by enabling more flexible and adaptable object recognition capabilities.

Papers

August 11, 2023

Taming Self-Training for Open-Vocabulary Object Detection
Shiyu Zhao, Samuel Schulter, Long Zhao, Zhixing Zhang, Vijay Kumar B. G, Yumin Suh, Manmohan Chandraker, Dimitris N. Metaxas
Pseudo Label Self Training Open Vocabulary Object Detection Noisy Supervision

July 24, 2023

Described Object Detection: Liberating Object Detection with Flexible Expressions
Chi Xie, Zhao Zhang, Yixuan Wu, Feng Zhu, Rui Zhao, Shuang Liang
Object Detector Open Vocabulary Object Detection Visual Description Referring Expression Comprehension Anchor Free Object Controllable Expression

July 7, 2023

Open-Vocabulary Object Detection via Scene Graph Discovery
Hengcan Shi, Munawar Hayat, Jianfei Cai
Scene Graph Open Vocabulary Object Detection Scene Graph Prediction

June 16, 2023

Scaling Open-Vocabulary Object Detection
Matthias Minderer, Alexey Gritsenko, Neil Houlsby
Vision Language Model Open Vocabulary Open Vocabulary Object Detection Detection Datasets Training Recipe

June 8, 2023

Multi-Modal Classifiers for Open-Vocabulary Object Detection
Prannay Kaul, Weidi Xie, Andrew Zisserman
Multi Modal Open Vocabulary Object Detection Two Stage Object Supervised Detector Open Vocabulary Detection Benchmark

May 11, 2023

Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
Dahun Kim, Anelia Angelova, Weicheng Kuo
Vision Transformer Open Vocabulary Object Detection Open Vocabulary Detection Text Contrastive Learning Open Vocabulary Detection Benchmark

April 10, 2023

DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
Lewei Yao, Jianhua Han, Xiaodan Liang, Dan Xu, Wei Zhang, Zhenguo Li, Hang Xu
Image Text Pair Open Vocabulary Object Detection Open Vocabulary Detection Region Word Alignment

April 7, 2023

V3Det: Vast Vocabulary Visual Detection Dataset
Jiaqi Wang, Pan Zhang, Tao Chu, Yuhang Cao, Yujie Zhou, Tong Wu, Bin Wang, Conghui He, Dahua Lin
Object Detection Open Vocabulary Object Detection General Object Vast Vocabulary Visual Detection

March 25, 2023

Prompt-Guided Transformers for End-to-End Open-Vocabulary Object Detection
Hwanjun Song, Jihwan Bang
Zero Shot Transformer Megatron Decepticons Open Vocabulary Object Detection Open Vocabulary Detection Lap Transformer

March 23, 2023

Open-Vocabulary Object Detection using Pseudo Caption Labels
Han-Cheol Cho, Won Young Jhoo, Wooyoung Kang, Byungseok Roh
Fine Grained Open Vocabulary Object Detection Captioning Model Pseudo Caption

March 10, 2023

Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
Luting Wang, Yi Liu, Penghui Du, Zihan Ding, Yue Liao, Qiaosong Qi, Biaolong Chen, Si Liu
Open Vocabulary Object Detection Object Level 3D Object Detection Distillation

February 27, 2023

Aligning Bag of Regions for Open-Vocabulary Object Detection
Size Wu, Wenwei Zhang, Sheng Jin, Wentao Liu, Chen Change Loy
Vision Language Model Jina Embeddings Region Specific Language Representation Open Vocabulary Object Detection Bag Prototype Open Vocabulary Object Detector

January 23, 2023

OvarNet: Towards Open-vocabulary Object Attribute Recognition
Keyan Chen, Xiaolong Jiang, Yao Hu, Xu Tang, Yan Gao, Jianqi Chen, Weidi Xie
Scene Understanding Open Vocabulary Object Detection Visual Attribute CLIP TD Outperforms

December 23, 2022

Learning to Detect and Segment for Open Vocabulary Object Detection
Tao Wang, Nan Li
LeArning Abstract Detection Model Open Vocabulary Object Detection Well Defined Segment Semantic Embeddings Object Proposal

November 27, 2022

Learning Object-Language Alignments for Open-Vocabulary Object Detection
Chuang Lin, Peize Sun, Yi Jiang, Ping Luo, Lizhen Qu, Gholamreza Haffari, Zehuan Yuan, Jianfei Cai
Fine Grained Open Vocabulary Object Detection Language Alignment Open Vocabulary Object Detector

November 4, 2022

Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models
Chengcheng Ma, Yang Liu, Jiankang Deng, Lingxi Xie, Weiming Dong, Changsheng Xu
Vision Language Model Vision Language Human Understanding Vision Task Style PROMPT Model Overfitting Open Vocabulary Object Detection Context Optimization

November 2, 2022

Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection
Yanxin Long, Jianhua Han, Runhui Huang, Xu Hang, Yi Zhu, Chunjing Xu, Xiaodan Liang
Fine Grained Vision Language Self Training Open Vocabulary Object Detection Fine Grained Visual

September 30, 2022

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Weicheng Kuo, Yin Cui, Xiuye Gu, AJ Piergiovanni, Anelia Angelova
Language Model Open Vocabulary Object Detection Pre Training Data Detection Open Vocabulary Detection Benchmark

July 18, 2022

Exploiting Unlabeled Data with Vision and Language Models for Object Detection
Shiyu Zhao, Zhixing Zhang, Samuel Schulter, Long Zhao, Vijay Kumar B. G, Anastasis Stathopoulos, Manmohan Chandraker, Dimitris Metaxas
Language Model Vision Paper Unlabeled Data Open Vocabulary Object Detection Semi Supervised Object Detection Unseen Category Generic Object Detection

June 22, 2022

Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization
Peixian Chen, Kekai Sheng, Mengdan Zhang, Mingbao Lin, Yunhang Shen, Shaohui Lin, Bo Ren, Ke Li
Pre Trained Vision Language Model Feature Alignment Open Vocabulary Object Detection Vision Language Alignment Blind Equalizer Proposal Generation