Video Generation
Video generation research focuses on creating realistic and controllable videos from various inputs like text, images, or other videos. Current efforts center on improving model architectures, such as diffusion models and diffusion transformers, to enhance video quality, temporal consistency, and controllability, often incorporating techniques like vector quantization for efficiency. This field is crucial for advancing multimedia applications, including content creation, simulation, and autonomous driving, by providing tools to generate high-quality, diverse, and easily manipulated video data. Furthermore, ongoing research is addressing the limitations of existing evaluation metrics to better align assessments with human perception.
Papers
TC4D: Trajectory-Conditioned Text-to-4D Generation
Sherwin Bahmani, Xian Liu, Wang Yifan, Ivan Skorokhodov, Victor Rong, Ziwei Liu, Xihui Liu, Jeong Joon Park, Sergey Tulyakov, Gordon Wetzstein, Andrea Tagliasacchi, David B. Lindell
Annotated Biomedical Video Generation using Denoising Diffusion Probabilistic Models and Flow Fields
Rüveyda Yilmaz, Dennis Eschweiler, Johannes Stegmaier
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models
Zhongwei Zhang, Fuchen Long, Yingwei Pan, Zhaofan Qiu, Ting Yao, Yang Cao, Tao Mei
A Survey on Long Video Generation: Challenges, Methods, and Prospects
Chengxuan Li, Di Huang, Zeyu Lu, Yang Xiao, Qingqi Pei, Lei Bai
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Roberto Henschel, Levon Khachatryan, Daniil Hayrapetyan, Hayk Poghosyan, Vahram Tadevosyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi
Explorative Inbetweening of Time and Space
Haiwen Feng, Zheng Ding, Zhihao Xia, Simon Niklaus, Victoria Abrevaya, Michael J. Black, Xuaner Zhang
Enabling Visual Composition and Animation in Unsupervised Video Generation
Aram Davtyan, Sepehr Sameni, Björn Ommer, Paolo Favaro
WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs
Deshun Yang, Luhui Hu, Yu Tian, Zihao Li, Chris Kelly, Bang Yang, Cindy Yang, Yuexian Zou
BlazeBVD: Make Scale-Time Equalization Great Again for Blind Video Deflickering
Xinmin Qiu, Congying Han, Zicheng Zhang, Bonan Li, Tiande Guo, Pingyu Wang, Xuecheng Nie