Open Vocabulary

Open vocabulary research aims to enable artificial intelligence systems to understand and interact with the world using free-form text descriptions, going beyond predefined categories. Current efforts focus on adapting large language and vision-language models (like CLIP and LLMs) to various tasks, including 3D scene understanding, object detection and tracking, and robotic manipulation, often employing architectures such as DETR and transformers. This work is significant because it pushes the boundaries of AI's ability to generalize to unseen objects and situations, with potential impact on autonomous driving, robotics, and other fields requiring robust real-world interaction.

Papers

July 1, 2024

June 21, 2024

Open-vocabulary Pick and Place via Patch-level Semantic Maps
Mingxi Jia, Haojie Huang, Zhewen Zhang, Chenghao Wang, Linfeng Zhao, Dian Wang, Jason Xinyu Liu, Robin Walters, Robert Platt, Stefanie Tellex
Open Vocabulary Language Conditioned Action Generation Place Behavior Synthesis Equivariant Operation

June 14, 2024

Language-Guided Manipulation with Diffusion Policies and Constrained Inpainting
Ce Hao, Kelvin Lin, Siyuan Luo, Harold Soh
Vision Language Model Generative Modeling Open Vocabulary Diffusion Policy Language Conditioned Image Inpainting Language Guided Manipulation

June 12, 2024

CTC-aligned Audio-Text Embedding for Streaming Open-vocabulary Keyword Spotting
Sichen Jin, Youngmoon Jung, Seungjin Lee, Jaeyoung Roh, Changwoo Han, Hoonyoung Cho
Open Vocabulary Librispeech Speech Recognition Keyword Enrollment Open Vocabulary Keyword Spotting

June 11, 2024

Beyond Bare Queries: Open-Vocabulary Object Grounding with 3D Scene Graph
Sergey Linok, Tatiana Zemskova, Svetlana Ladanova, Roman Titkov, Dmitry Yudin, Maxim Monastyrny, Aleksei Valenkov
Vision Language Model Open Vocabulary Object Centric 3D Scene Graph Spatial Graph 3D Object Retrieval Conditional Query

June 7, 2024

OVMR: Open-Vocabulary Recognition with Multi-Modal References
Zehong Ma, Shiliang Zhang, Longhui Wei, Qi Tian
Multi Modal Open Vocabulary Textual Description Modal Clue

June 4, 2024

OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding
Yanmin Wu, Jiarui Meng, Haijie Li, Chenming Wu, Yahao Shi, Xinhua Cheng, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Jian Zhang
Open Vocabulary Point Cloud Understanding 3D Awareness Cross Scene Open Vocabulary 3D

May 30, 2024

OpenDAS: Open-Vocabulary Domain Adaptation for 2D and 3D Segmentation
Gonca Yilmaz, Songyou Peng, Marc Pollefeys, Francis Engelmann, Hermann Blum
Domain Adaptation Vision Language Model Open Vocabulary Open Vocabulary Segmentation Open Domain Generalization

May 29, 2024

Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models
Tianrun Chen, Chunan Yu, Jing Li, Jianqi Zhang, Lanyun Zhu, Deyi Ji, Yong Zhang, Ying Zang, Zejian Li, Lingyun Sun
Complex Reasoning Large Vision Language Model 3D Content Open Vocabulary 3D Segmentation Part Segmentation 3D Instance Segmentation 3D Reasoning Zero Shot 3D

May 28, 2024

OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision
Junjie Wang, Bin Chen, Bin Kang, Yulin Li, YiChi Chen, Weizhi Xian, Huifeng Chang
Open World Open Vocabulary Open Vocabulary Detection Deformable DETR 2 Dimensional Supervision Query DeNoising

May 24, 2024

Open-Vocabulary SAM3D: Towards Training-free Open-Vocabulary 3D Scene Understanding
Hanchen Tai, Qingdong He, Jiangning Zhang, Yijie Qian, Zhenyu Zhang, Xiaobin Hu, Xiangtai Li, Yabiao Wang, Yong Liu
Zero Shot Vision Language Model 3D Scene Open Vocabulary Real World 3D

May 16, 2024

SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection
Mingxuan Liu, Tyler L. Hayes, Elisa Ricci, Gabriela Csurka, Riccardo Volpi
Open Vocabulary Open Vocabulary Object Detection Rule Learning Semantic Hierarchy

May 14, 2024

Open-Vocabulary Object Detection via Neighboring Region Attention Alignment
Sunyuan Qiang, Xianfei Li, Yanyan Liang, Wenlong Liao, Tao He, Pai Peng
Pre Trained Vision Language Model Open Vocabulary Open Vocabulary Object Detection

April 30, 2024

One-Stage Open-Vocabulary Temporal Action Detection Leveraging Temporal Multi-scale and Action Label Features
Trung Thanh Nguyen, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide
Open Vocabulary Action Label Vocabulary Temporal Action Detection

April 18, 2024

Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds
Oliver Lemke, Zuria Bauer, René Zurbrügg, Marc Pollefeys, Francis Engelmann, Hermann Blum
Point Cloud New Framework Robotics Domain Robotic Manipulation Open Vocabulary 3D Instance Segmentation SpOT Robot Robot Interaction Grasp Prediction

April 16, 2024

Vocabulary-free Image Classification and Semantic Segmentation
Alessandro Conti, Enrico Fini, Massimiliano Mancini, Paolo Rota, Yiming Wang, Elisa Ricci
Semantic Segmentation Large Vision Language Model Pre Trained Vision Language Model Open Vocabulary Segmentation Framework Segmentation Free

April 12, 2024

Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation
Yanhao Zheng, Kai Liu
Open Vocabulary Training Free Open Vocabulary Object Detection Object Classification

April 5, 2024

Open vocabulary keyword spotting through transfer learning from speech synthesis
Kesavaraj V, Anil Kumar Vuppala
Transfer Learning Speech Synthesis Formality Transfer Open Vocabulary Text Representation Text Encoder Heterogeneous Modality

April 1, 2024

Open-Vocabulary Federated Learning with Multimodal Prototyping
Huimin Zeng, Zhenrui Yue, Dong Wang
Pre Trained Vision Language Model Open Vocabulary Label Space Modal Prototype Open Source Federated Learning Framework