Self Supervised Monocular Depth Estimation

Self-supervised monocular depth estimation aims to infer 3D depth from a single 2D image without relying on labeled depth data, a significant challenge in computer vision. Current research focuses on improving accuracy and robustness by employing hybrid architectures that combine convolutional neural networks (CNNs) with transformers to capture both local and global image features, and by incorporating various forms of prior information, such as pseudo-labels, optical flow, and geometric constraints. These advancements are crucial for applications requiring real-time depth perception in robotics, autonomous driving, and augmented/virtual reality, where labeled data is scarce and computational efficiency is paramount.

Papers