Long Video Generation
Long video generation aims to create high-quality, temporally consistent videos exceeding the length typically achievable with existing methods. Current research focuses on improving efficiency and coherence through techniques like dividing the generation task into sub-problems (e.g., structure control and refinement), employing novel attention mechanisms to manage long-range dependencies, and leveraging distributed computing across multiple GPUs. These advancements are significant because they enable the creation of longer, more realistic videos for applications ranging from filmmaking and animation to virtual and augmented reality experiences.
Papers
Video Is Worth a Thousand Images: Exploring the Latest Trends in Long Video Generation
Faraz Waseem, Muhammad Shahzad
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
Minghong Cai, Xiaodong Cun, Xiaoyu Li, Wenze Liu, Zhaoyang Zhang, Yong Zhang, Ying Shan, Xiangyu Yue
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
Hongjie Wang, Chih-Yao Ma, Yen-Cheng Liu, Ji Hou, Tao Xu, Jialiang Wang, Felix Juefei-Xu, Yaqiao Luo, Peizhao Zhang, Tingbo Hou, Peter Vajda, Niraj K. Jha, Xiaoliang Dai
MSC: Multi-Scale Spatio-Temporal Causal Attention for Autoregressive Video Diffusion
Xunnong Xu, Mengying Cao
CPA: Camera-pose-awareness Diffusion Transformer for Video Generation
Yuelei Wang, Jian Zhang, Pengtao Jiang, Hao Zhang, Jinwei Chen, Bo Li
Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
Xin Yan, Yuxuan Cai, Qiuyue Wang, Yuan Zhou, Wenhao Huang, Huan Yang