Video Prediction
Video prediction aims to generate future frames of a video sequence, based on preceding frames, addressing challenges in modeling complex dynamics and uncertainty. Current research emphasizes incorporating procedural knowledge and physical constraints into data-driven models, often employing architectures like transformers, diffusion models, and state-space models with various techniques for handling long-term dependencies and multi-modality (e.g., integrating text or tactile data). This field is significant for its potential applications in robotics, autonomous driving, and other areas requiring predictive modeling of dynamic visual scenes, driving advancements in both computer vision and artificial intelligence.
Papers
Video Prediction Models as Rewards for Reinforcement Learning
Alejandro Escontrela, Ademi Adeniji, Wilson Yan, Ajay Jain, Xue Bin Peng, Ken Goldberg, Youngwoon Lee, Danijar Hafner, Pieter Abbeel
Let's Think Frame by Frame with VIP: A Video Infilling and Prediction Dataset for Evaluating Video Chain-of-Thought
Vaishnavi Himakunthala, Andy Ouyang, Daniel Rose, Ryan He, Alex Mei, Yujie Lu, Chinmay Sonar, Michael Saxon, William Yang Wang