Paper ID: 2311.17084

DepthSSC: Depth-Spatial Alignment and Dynamic Voxel Resolution for Monocular 3D Semantic Scene Completion

Jiawei Yao, Jusheng Zhang

The task of 3D semantic scene completion with monocular cameras is gaining increasing attention in the field of autonomous driving. Its objective is to predict the occupancy status of each voxel in the 3D scene from partial image inputs. Despite the existence of numerous methods, many of them overlook the issue of accurate alignment between spatial and depth information. To address this, we propose DepthSSC, an advanced method for semantic scene completion solely based on monocular cameras. DepthSSC combines the ST-GF (Spatial Transformation Graph Fusion) module with geometric-aware voxelization, enabling dynamic adjustment of voxel resolution and considering the geometric complexity of 3D space to ensure precise alignment between spatial and depth information. This approach successfully mitigates spatial misalignment and distortion issues observed in prior methods. Through evaluation on the SemanticKITTI dataset, DepthSSC not only demonstrates its effectiveness in capturing intricate 3D structural details but also achieves state-of-the-art performance. We believe DepthSSC provides a fresh perspective on monocular camera-based 3D semantic scene completion research and anticipate it will inspire further related studies.

Submitted: Nov 28, 2023