Paper ID: 2504.10001 • Published Apr 14, 2025
GaussVideoDreamer: 3D Scene Generation with Video Diffusion and Inconsistency-Aware Gaussian Splatting
Junlin Hao, Peiheng Wang, Haoyang Wang, Xinggong Zhang, Zongming Guo
Peking University
TL;DR
Get AI-generated summaries with premium
Get AI-generated summaries with premium
Single-image 3D scene reconstruction presents significant challenges due to
its inherently ill-posed nature and limited input constraints. Recent advances
have explored two promising directions: multiview generative models that train
on 3D consistent datasets but struggle with out-of-distribution generalization,
and 3D scene inpainting and completion frameworks that suffer from cross-view
inconsistency and suboptimal error handling, as they depend exclusively on
depth data or 3D smoothness, which ultimately degrades output quality and
computational performance. Building upon these approaches, we present
GaussVideoDreamer, which advances generative multimedia approaches by bridging
the gap between image, video, and 3D generation, integrating their strengths
through two key innovations: (1) A progressive video inpainting strategy that
harnesses temporal coherence for improved multiview consistency and faster
convergence. (2) A 3D Gaussian Splatting consistency mask to guide the video
diffusion with 3D consistent multiview evidence. Our pipeline combines three
core components: a geometry-aware initialization protocol, Inconsistency-Aware
Gaussian Splatting, and a progressive video inpainting strategy. Experimental
results demonstrate that our approach achieves 32% higher LLaVA-IQA scores and
at least 2x speedup compared to existing methods while maintaining robust
performance across diverse scenes.
Figures & Tables
Unlock access to paper figures and tables to enhance your research experience.