3D Scene Understanding
3D scene understanding aims to enable computers to perceive and interpret three-dimensional environments, facilitating applications in robotics, autonomous driving, and virtual reality. Current research focuses on developing robust and efficient models, often leveraging neural radiance fields, large language models (LLMs), and transformer architectures, to achieve tasks such as semantic segmentation, instance segmentation, and object pose estimation. These advancements are driven by the need for more accurate and reliable scene representations, often addressing challenges like data scarcity, class imbalance, and the need for generalization across diverse scenes. The resulting improvements in 3D scene understanding have significant implications for various fields, enabling more sophisticated interactions between humans and machines in complex environments.
Papers
NIS-SLAM: Neural Implicit Semantic RGB-D SLAM for 3D Consistent Scene Understanding
Hongjia Zhai, Gan Huang, Qirui Hu, Guanglin Li, Hujun Bao, Guofeng Zhang
3D-GRES: Generalized 3D Referring Expression Segmentation
Changli Wu, Yihang Liu, Jiayi Ji, Yiwei Ma, Haowei Wang, Gen Luo, Henghui Ding, Xiaoshuai Sun, Rongrong Ji
Grounded 3D-LLM with Referent Tokens
Yilun Chen, Shuai Yang, Haifeng Huang, Tai Wang, Runsen Xu, Ruiyuan Lyu, Dahua Lin, Jiangmiao Pang
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Xianzheng Ma, Yash Bhalgat, Brandon Smart, Shuai Chen, Xinghui Li, Jian Ding, Jindong Gu, Dave Zhenyu Chen, Songyou Peng, Jia-Wang Bian, Philip H Torr, Marc Pollefeys, Matthias Nießner, Ian D Reid, Angel X. Chang, Iro Laina, Victor Adrian Prisacariu