Open Vocabulary

Open vocabulary research aims to enable artificial intelligence systems to understand and interact with the world using free-form text descriptions, going beyond predefined categories. Current efforts focus on adapting large language and vision-language models (like CLIP and LLMs) to various tasks, including 3D scene understanding, object detection and tracking, and robotic manipulation, often employing architectures such as DETR and transformers. This work is significant because it pushes the boundaries of AI's ability to generalize to unseen objects and situations, with potential impact on autonomous driving, robotics, and other fields requiring robust real-world interaction.

Papers