3D Supervision

3D supervision in computer vision aims to reconstruct three-dimensional scenes and objects from limited input data, such as single images or videos, often circumventing the need for expensive and laborious 3D annotations. Current research focuses on developing self-supervised and weakly-supervised learning methods, employing architectures like neural radiance fields (NeRFs), transformers, and occupancy networks to leverage 2D information (e.g., images, silhouettes, depth maps) or indirect 3D cues (e.g., synthetic data, multi-view consistency) for training. This work is significant because it enables more efficient and scalable 3D scene understanding, with applications in robotics, autonomous driving, and augmented/virtual reality.

Papers