Text to Video Diffusion Model
Text-to-video diffusion models aim to generate realistic and coherent videos from textual descriptions, pushing the boundaries of video synthesis. Current research focuses on improving the quality and controllability of generated videos, exploring techniques like attention mechanisms, 3D variational autoencoders, and hybrid priors to enhance temporal consistency, motion realism, and semantic alignment. These advancements have significant implications for various fields, including animation, video editing, and video understanding tasks like object segmentation, by enabling efficient and flexible video content creation and manipulation.
Papers
Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation
Changgu Chen, Junwei Shu, Lianggangxu Chen, Gaoqi He, Changbo Wang, Yang Li
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Zhao Wang, Aoxue Li, Lingting Zhu, Yong Guo, Qi Dou, Zhenguo Li
Customizing Motion in Text-to-Video Diffusion Models
Joanna Materzynska, Josef Sivic, Eli Shechtman, Antonio Torralba, Richard Zhang, Bryan Russell
MTVG : Multi-text Video Generation with Text-to-Video Models
Gyeongrok Oh, Jaehwan Jeong, Sieun Kim, Wonmin Byeon, Jinkyu Kim, Sungwoong Kim, Hyeokmin Kwon, Sangpil Kim