Video Prediction

Video prediction aims to generate future frames of a video sequence, based on preceding frames, addressing challenges in modeling complex dynamics and uncertainty. Current research emphasizes incorporating procedural knowledge and physical constraints into data-driven models, often employing architectures like transformers, diffusion models, and state-space models with various techniques for handling long-term dependencies and multi-modality (e.g., integrating text or tactile data). This field is significant for its potential applications in robotics, autonomous driving, and other areas requiring predictive modeling of dynamic visual scenes, driving advancements in both computer vision and artificial intelligence.

Papers