Open Vocabulary Occupancy

Open vocabulary occupancy estimation aims to create 3D models of environments that accurately represent the location and semantic labels of objects, without relying on extensive, manually labeled 3D training data. Current research focuses on leveraging pre-trained vision-language models and self-supervised learning techniques, often employing differentiable volume rendering or knowledge distillation from 2D segmentation models to achieve open-vocabulary capabilities. This approach significantly reduces the reliance on costly 3D annotations, improving the scalability and practicality of 3D scene understanding for applications like autonomous driving and robotics. The resulting advancements promise more robust and adaptable perception systems for various applications.

Papers