Open Vocabulary
Open vocabulary research aims to enable artificial intelligence systems to understand and interact with the world using free-form text descriptions, going beyond predefined categories. Current efforts focus on adapting large language and vision-language models (like CLIP and LLMs) to various tasks, including 3D scene understanding, object detection and tracking, and robotic manipulation, often employing architectures such as DETR and transformers. This work is significant because it pushes the boundaries of AI's ability to generalize to unseen objects and situations, with potential impact on autonomous driving, robotics, and other fields requiring robust real-world interaction.
Papers
Fast and Efficient: Mask Neural Fields for 3D Scene Segmentation
Zihan Gao, Lingling Li, Licheng Jiao, Fang Liu, Xu Liu, Wenping Ma, Yuwei Guo, Shuyuan Yang
Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models
Takayuki Nishimura, Katsuyuki Kuyo, Motonari Kambara, Komei Sugiura