Consistent Video Depth

Consistent video depth estimation aims to generate accurate and temporally stable depth maps from video sequences, overcoming the flickering and inconsistencies inherent in frame-by-frame approaches. Current research focuses on leveraging temporal information through various methods, including iterative refinement techniques, multi-view optimization strategies, and the integration of diffusion models or pre-trained vision-language models like CLIP. These advancements improve the accuracy and efficiency of depth estimation, impacting applications such as augmented reality, 3D scene reconstruction, and video object segmentation by providing more reliable and robust depth information.

Papers