Video Depth Estimation

Video depth estimation aims to accurately and consistently infer the distance of objects from a camera in video sequences, a crucial task for applications like augmented reality and 3D scene reconstruction. Current research focuses on improving temporal consistency across video frames, often leveraging techniques like diffusion models, memory networks, and future frame prediction to enhance accuracy and efficiency. These advancements are achieved through various model architectures, including transformer networks and adaptations of single-image depth estimation models, leading to improved performance on diverse datasets and enabling more robust downstream applications.

Papers