Open Vocabulary

Open vocabulary research aims to enable artificial intelligence systems to understand and interact with the world using free-form text descriptions, going beyond predefined categories. Current efforts focus on adapting large language and vision-language models (like CLIP and LLMs) to various tasks, including 3D scene understanding, object detection and tracking, and robotic manipulation, often employing architectures such as DETR and transformers. This work is significant because it pushes the boundaries of AI's ability to generalize to unseen objects and situations, with potential impact on autonomous driving, robotics, and other fields requiring robust real-world interaction.

Papers

August 20, 2024

August 17, 2024

Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community
Jiancheng Pan, Yanxing Liu, Yuqian Fu, Muyuan Ma, Jiaohao Li, Danda Pani Paudel, Luc Van Gool, Xiaomeng Huang
Open Vocabulary Open Vocabulary Object Detection Open Vocabulary Object Detector Locate Anything

August 15, 2024

DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions
Ryosuke Korekata, Kanta Kaneda, Shunya Nagashima, Yuto Imai, Komei Sugiura
Open Vocabulary Service Robot Mobile Manipulation Multimodal Foundation Model Target Object

August 7, 2024

Query3D: LLM-Powered Open-Vocabulary Scene Segmentation with Language Embedded 3D Gaussian
Amirhosein Chahe, Lifeng Zhou
Large Language Model Autonomous Driving Open Vocabulary Open Vocabulary Object Detection 3D Scene Representation

August 5, 2024

Explain via Any Concept: Concept Bottleneck Model with Open Vocabulary Concepts
Andong Tan, Fengtao Zhou, Hao Chen
Vision Language Open Vocabulary Concept Bottleneck Model Zero Shot Classification High Impact Concept Interpretable Concept

July 31, 2024

Open-Vocabulary Audio-Visual Semantic Segmentation
Ruohao Guo, Liao Qu, Dantong Niu, Yanyu Qi, Wenzhen Yue, Ji Shi, Bowei Xing, Xianghua Ying
Open Vocabulary VidSGG Datasets Audio Visual Semantic Segmentation

July 18, 2024

July 17, 2024

CerberusDet: Unified Multi-Dataset Object Detection
Irina Tolstykh, Mikhail Chernyshov, Maksim Kuprashevich
Computer Vision Open Vocabulary Multi Object

July 16, 2024

LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction
Penghui Du, Yu Wang, Yifan Sun, Luting Wang, Yue Liao, Gang Zhang, Errui Ding, Yan Wang, Jingdong Wang, Si Liu
Vision Language Model Open Vocabulary Open Vocabulary Object Detection Open Vocabulary Detection DETR Based Detector

July 15, 2024

July 13, 2024

Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
Ruihuang Li, Zhengqiang Zhang, Chenhang He, Zhiyuan Ma, Vishal M. Patel, Lei Zhang
Zero Shot Large Vision Language Model Open Vocabulary Multimodal Alignment Text to Image Association

July 12, 2024

July 10, 2024

OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
Hao Wang, Pengzhen Ren, Zequn Jie, Xiao Dong, Chengjian Feng, Yinlong Qian, Lin Ma, Dongmei Jiang, Yaowei Wang, Xiangyuan Lan, Xiaodan Liang
Open Vocabulary Open Vocabulary Detection DiNO Mix Cross Modality Alignment Open Vocabulary Detection Benchmark Language Aware Selective Fusion

July 5, 2024

CountGD: Multi-Modal Open-World Counting
Niki Amini-Naieni, Tengda Han, Andrew Zisserman
Open Vocabulary Counting Benchmark