3d Scene Perception

3D scene perception aims to enable computers to understand and interact with three-dimensional environments, mirroring human spatial reasoning. Current research heavily emphasizes leveraging large language models (LLMs) and multimodal approaches, often incorporating transformers and memory-based mechanisms to process diverse data sources like RGB-D videos and LiDAR point clouds, both individually and in combination. These advancements are driving progress in autonomous driving, robotics, and other applications requiring robust scene understanding, particularly through improved accuracy and efficiency in tasks like 3D object detection and segmentation. The development of training-free paradigms and efficient multi-modal architectures further enhances the practicality and scalability of these systems.

Papers

December 17, 2024

RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion
Xiaomeng Chu, Jiajun Deng, Guoliang You, Yifan Duan, Houqiang Li, Yanyong Zhang
3D Object Detection 3D Object Radar Camera Fusion 3d Scene Perception Radar Camera Depth

March 18, 2024

Agent3D-Zero: An Agent for Zero-shot 3D Understanding
Sha Zhang, Di Huang, Jiajun Deng, Shixiang Tang, Wanli Ouyang, Tong He, Yanyong Zhang
Zero Shot Agent Smith 3D Scene 3D Understanding 3d Scene Perception

March 11, 2024

January 8, 2024

DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving
Wencheng Han, Dongqian Guo, Cheng-Zhong Xu, Jianbing Shen
Autonomous Driving Self Driving Car Autonomous Driving System Decision Logic Perception Aware Planning 3d Scene Perception

December 21, 2023

LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding
Senqiao Yang, Jiaming Liu, Ray Zhang, Mingjie Pan, Zoey Guo, Xiaoqi Li, Zehui Chen, Peng Gao, Yandong Guo, Shanghang Zhang
Question Answering Mid Range LiDAR Full Potential 3D Content 3d Scene Perception

August 15, 2023

UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation
Haiyang Wang, Hao Tang, Shaoshuai Shi, Aoxue Li, Zhenguo Li, Bernt Schiele, Liwei Wang
3D Object Detection Bird'S Eye View Multi Modal Transformer 3D Perception Task Modality Agnostic Transformer Encoder 3d Scene Perception

July 25, 2023

Mini-PointNetPlus: a local feature descriptor in deep learning model for 3d environment perception
Chuanyu Luo, Nuo Cheng, Sikun Ma, Jun Xiang, Xiaohan Li, Shengguang Lei, Pu Li
Point Cloud Deep Learning Model 3D Perception PointNet Model 3d Scene Perception

July 11, 2022

Learning Spatial and Temporal Variations for 4D Point Cloud Segmentation
Shi Hanyu, Wei Jiacheng, Wang Hao, Liu Fayao, Lin Guosheng
Point Cloud Segmentation Point Cloud Data Temporal Variation Spatial Learning Temporal Interpolation 3d Scene Perception 3D Recognition

3d Scene Perception

Papers

RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion

Agent3D-Zero: An Agent for Zero-shot 3D Understanding

Memory-based Adapters for Online 3D Scene Perception

PointSeg: A Training-Free Paradigm for 3D Scene Segmentation via Foundation Models

DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving

LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding

UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation

Mini-PointNetPlus: a local feature descriptor in deep learning model for 3d environment perception

Learning Spatial and Temporal Variations for 4D Point Cloud Segmentation