Scene Understanding
Scene understanding in computer vision aims to enable machines to interpret and reason about visual scenes, mirroring human perception. Current research heavily focuses on integrating multiple data modalities (e.g., audio, depth, video) and leveraging advanced architectures like transformers and neural radiance fields to achieve robust object detection, segmentation, and scene graph generation, often within specific application domains such as autonomous driving and robotics. These advancements are crucial for developing more intelligent and reliable systems in various fields, from autonomous vehicles navigating complex environments to robots interacting with human-centered spaces. Benchmark datasets and standardized evaluation metrics are also actively being developed to facilitate progress and ensure reliable comparisons between different approaches.
Papers
EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting
Daiwei Zhang, Gengyan Li, Jiajie Li, Mickaël Bressieux, Otmar Hilliges, Marc Pollefeys, Luc Van Gool, Xi Wang
Mobile Robot Oriented Large-Scale Indoor Dataset for Dynamic Scene Understanding
Yifan Tang, Cong Tai, Fangxing Chen, Wanting Zhang, Tao Zhang, Xueping Liu, Yongjin Liu, Long Zeng
Towards Trustworthy Automated Driving through Qualitative Scene Understanding and Explanations
Nassim Belmecheri, Arnaud Gotlieb, Nadjib Lazaar, Helge Spieker
DOCTR: Disentangled Object-Centric Transformer for Point Scene Understanding
Xiaoxuan Yu, Hao Wang, Weiming Li, Qiang Wang, Soonyong Cho, Younghun Sung