Open Vocabulary Scene Understanding

Open vocabulary scene understanding aims to enable computers to understand and label objects in 3D scenes using natural language descriptions, going beyond predefined categories. Current research focuses on improving the accuracy and efficiency of this understanding, particularly through the use of neural implicit representations, 3D Gaussian splatting, and visual-language models that leverage pre-trained 2D image encoders. These advancements are crucial for robotics, augmented reality, and other applications requiring robust and flexible scene interpretation, enabling more sophisticated interaction with complex environments.

Papers