Open Vocabulary Object Detection

Open-vocabulary object detection (OVOD) aims to enable computer vision systems to identify objects using textual descriptions, even if those objects weren't seen during training. Current research focuses on improving the accuracy and efficiency of OVOD, often leveraging vision-language models (like CLIP) and transformer-based architectures (like DETR) to bridge the gap between visual and textual representations, and addressing challenges like fine-grained attribute recognition and robustness to distribution shifts. The advancements in OVOD have significant implications for various applications, including autonomous driving, robotics, and remote sensing, by enabling more flexible and adaptable object recognition capabilities.

Papers

December 22, 2023

GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Haozhan Shen, Tiancheng Zhao, Mingwei Zhu, Jianwei Yin
Zero Shot Vision Language Visual Grounding Open Vocabulary Object Detection

December 19, 2023

Weakly Supervised Open-Vocabulary Object Detection
Jianghang Lin, Yunhang Shen, Bingquan Wang, Shaohui Lin, Ke Li, Liujuan Cao
Open Vocabulary Object Detection Region Proposal Weakly Supervised Object Detection

December 18, 2023

CLIM: Contrastive Language-Image Mosaic for Region Representation
Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Wentao Liu, Chen Change Loy
Contrastive Language Image Image Text Pair Open Vocabulary Object Detection Vision Language Alignment Open Vocabulary Object Detector

December 16, 2023

Simple Image-level Classification Improves Open-vocabulary Object Detection
Ruohuan Fang, Guansong Pang, Xiao Bai
Vision Language Detection Task Open Vocabulary Object Detection Image Level Classification Instance Score

December 12, 2023

ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for Open-Vocabulary Object Detection
Joonhyun Jeong, Geondo Park, Jayeon Yoo, Hyungsik Jung, Heesu Kim
Pseudo Labeling Open Vocabulary Object Detection Unseen Class Transparent Proxy Server Architecture Class Mixup

December 4, 2023

Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection
Sunghun Kang, Junbum Cha, Jonghwan Mun, Byungseok Roh, Chang D. Yoo
Vision Language Pseudo Label Pseudo Labeling Open Vocabulary Object Detection Image to Text Mapping

November 29, 2023

The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding
Lorenzo Bianchi, Fabio Carrara, Nicola Messina, Claudio Gennaro, Fabrizio Falchi
Fine Grained Object Detection Large Vision Language Model Open Vocabulary Object Detection Trading Devil Open Vocabulary Object Detector Open Vocabulary Detection Benchmark

November 20, 2023

Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning
Yan Li, Weiwei Guo, Xue Yang, Ning Liao, Dunyun He, Jiaqi Zhou, Wenxian Yu
Open Vocabulary Object Detection Aerial Object Detection

November 7, 2023

Meta-Adapter: An Online Few-shot Learner for Vision-Language Model
Cheng Cheng, Lin Song, Ruoyi Xue, Hang Wang, Hongbin Sun, Yixiao Ge, Ying Shan
Zero Shot Vision Language Model Shot Learning Open Vocabulary Object Detection Contrastive Vision Language Shot Learner

October 31, 2023

Spuriosity Rankings for Free: A Simple Framework for Last Layer Retraining Based on Object Detection
Mohammad Azizmalayeri, Reza Abbasi, Amir Hosein Haji Mohammad rezaie, Reihaneh Zohrabi, Mahdi Amiri, Mohammad Taghi Manzuri, Mohammad Hossein Rohban
Deep Neural Network Object Detector Partial Ranking Open Vocabulary Object Detection ImageNet 1k Last Layer Last Layer Retraining

October 26, 2023

LP-OVOD: Open-Vocabulary Object Detection by Linear Probing
Chau Pham, Truong Vu, Khoi Nguyen
Object Detector Open Vocabulary Object Detection Linear Probing Unseen Class Object Box Joint Image Text

October 25, 2023

CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
Chuofan Ma, Yi Jiang, Xin Wen, Zehuan Yuan, Xiaojuan Qi
Vision Language Open Vocabulary Object Detection Vision Language Representation Region Word Alignment

October 22, 2023

OV-VG: A Benchmark for Open-Vocabulary Visual Grounding
Chunlei Wang, Wenquan Feng, Xiangtai Li, Guangliang Cheng, Shuchang Lyu, Binghao Liu, Lijiang Chen, Qi Zhao
New Benchmark Visual Grounding Open Vocabulary Open Vocabulary Object Detection Shot Localization

October 2, 2023

September 29, 2023

Detection-Oriented Image-Text Pretraining for Open-Vocabulary Detection
Dahun Kim, Anelia Angelova, Weicheng Kuo
Open Vocabulary Object Detection Open Vocabulary Detection Semantic Cue Open Vocabulary Detection Benchmark Image Text Pretraining

September 26, 2023

MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection
Kemal Oksuz, Selim Kuzucu, Tom Joy, Puneet K. Dokania
Object Detector Mixture Component Open Vocabulary Object Detection Detection Confidence Calibrated Learning

September 3, 2023

EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment
Cheng Shi, Sibei Yang
Open Vocabulary Object Detection Object Embeddings Object Prediction Dense Alignment

September 2, 2023

Contrastive Feature Masking Open-Vocabulary Vision Transformer
Dahun Kim, Anelia Angelova, Weicheng Kuo
Vision Transformer Open Vocabulary Object Detection Zero Shot Transfer Contrastive Learning Objective Image Text Pretraining

August 30, 2023

Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
Yifan Xu, Mengdan Zhang, Xiaoshan Yang, Changsheng Xu
Multi Modal Open Vocabulary Object Detection Cross Modal Knowledge Distillation Multimodal in Context Learning