Video Synthesis
Video synthesis aims to generate realistic and coherent videos from various inputs, such as text descriptions, images, or existing videos, with a current research emphasis on improving controllability, temporal consistency, and resolution. Diffusion models, often coupled with techniques like optical flow analysis and latent space manipulation, are prominent architectures driving advancements, alongside approaches leveraging large language models for higher-level control and compositional methods for complex scenes. This field is significant for its potential impact on film production, animation, video editing, and other multimedia applications, as well as for its contributions to understanding and modeling complex temporal dynamics in visual data.
Papers
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models
Yabo Zhang, Yuxiang Wei, Xianhui Lin, Zheng Hui, Peiran Ren, Xuansong Xie, Xiangyang Ji, Wangmeng Zuo
Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation
Joseph Cho, Fachrina Dewi Puspitasari, Sheng Zheng, Jingyao Zheng, Lik-Hang Lee, Tae-Ho Kim, Choong Seon Hong, Chaoning Zhang