3d Scene Perception

3D scene perception aims to enable computers to understand and interact with three-dimensional environments, mirroring human spatial reasoning. Current research heavily emphasizes leveraging large language models (LLMs) and multimodal approaches, often incorporating transformers and memory-based mechanisms to process diverse data sources like RGB-D videos and LiDAR point clouds, both individually and in combination. These advancements are driving progress in autonomous driving, robotics, and other applications requiring robust scene understanding, particularly through improved accuracy and efficiency in tasks like 3D object detection and segmentation. The development of training-free paradigms and efficient multi-modal architectures further enhances the practicality and scalability of these systems.

Papers