Open Vocabulary Scene

Open vocabulary scene understanding aims to enable computers to understand and interact with 3D scenes using natural language descriptions, going beyond predefined object categories. Current research focuses on developing efficient and accurate models that integrate language features into 3D representations, often leveraging neural implicit representations, 3D Gaussians, or CLIP embeddings, and employing techniques like knowledge distillation and contrastive learning. This field is significant for advancing robotics, augmented reality, and other applications requiring flexible and robust scene interpretation, particularly in dynamic and unstructured environments. The development of robust benchmarks and improved evaluation metrics is also a key area of ongoing work.

Papers