Open Vocabulary
Open vocabulary research aims to enable artificial intelligence systems to understand and interact with the world using free-form text descriptions, going beyond predefined categories. Current efforts focus on adapting large language and vision-language models (like CLIP and LLMs) to various tasks, including 3D scene understanding, object detection and tracking, and robotic manipulation, often employing architectures such as DETR and transformers. This work is significant because it pushes the boundaries of AI's ability to generalize to unseen objects and situations, with potential impact on autonomous driving, robotics, and other fields requiring robust real-world interaction.
Papers
SegPoint: Segment Any Point Cloud via Large Language Model
Shuting He, Henghui Ding, Xudong Jiang, Bihan Wen
Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
Xiaoyu Zhu, Hao Zhou, Pengfei Xing, Long Zhao, Hao Xu, Junwei Liang, Alexander Hauptmann, Ting Liu, Andrew Gallagher
Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation
Pengfei Wang, Yuxi Wang, Shuai Li, Zhaoxiang Zhang, Zhen Lei, Lei Zhang
OVGNet: A Unified Visual-Linguistic Framework for Open-Vocabulary Robotic Grasping
Li Meng, Zhao Qi, Lyu Shuchang, Wang Chunlei, Ma Yujing, Cheng Guangliang, Yang Chenguang
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion
Philipp Allgeuer, Kyra Ahrens, Stefan Wermter
OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer
Yu Wang, Xiangbo Su, Qiang Chen, Xinyu Zhang, Teng Xi, Kun Yao, Errui Ding, Gang Zhang, Jingdong Wang
Open Vocabulary Multi-Label Video Classification
Rohit Gupta, Mamshad Nayeem Rizve, Jayakrishnan Unnikrishnan, Ashish Tawari, Son Tran, Mubarak Shah, Benjamin Yao, Trishul Chilimbi
OVExp: Open Vocabulary Exploration for Object-Oriented Navigation
Meng Wei, Tai Wang, Yilun Chen, Hanqing Wang, Jiangmiao Pang, Xihui Liu