3d Scene Perception
3D scene perception aims to enable computers to understand and interact with three-dimensional environments, mirroring human spatial reasoning. Current research heavily emphasizes leveraging large language models (LLMs) and multimodal approaches, often incorporating transformers and memory-based mechanisms to process diverse data sources like RGB-D videos and LiDAR point clouds, both individually and in combination. These advancements are driving progress in autonomous driving, robotics, and other applications requiring robust scene understanding, particularly through improved accuracy and efficiency in tasks like 3D object detection and segmentation. The development of training-free paradigms and efficient multi-modal architectures further enhances the practicality and scalability of these systems.
Papers
Memory-based Adapters for Online 3D Scene Perception
Xiuwei Xu, Chong Xia, Ziwei Wang, Linqing Zhao, Yueqi Duan, Jie Zhou, Jiwen Lu
PointSeg: A Training-Free Paradigm for 3D Scene Segmentation via Foundation Models
Qingdong He, Jinlong Peng, Zhengkai Jiang, Xiaobin Hu, Jiangning Zhang, Qiang Nie, Yabiao Wang, Chengjie Wang