Video Generation
Video generation research focuses on creating realistic and controllable videos from various inputs like text, images, or other videos. Current efforts center on improving model architectures, such as diffusion models and diffusion transformers, to enhance video quality, temporal consistency, and controllability, often incorporating techniques like vector quantization for efficiency. This field is crucial for advancing multimedia applications, including content creation, simulation, and autonomous driving, by providing tools to generate high-quality, diverse, and easily manipulated video data. Furthermore, ongoing research is addressing the limitations of existing evaluation metrics to better align assessments with human perception.
Papers
DirecT2V: Large Language Models are Frame-Level Directors for Zero-Shot Text-to-Video Generation
Susung Hong, Junyoung Seo, Heeseong Shin, Sunghwan Hong, Seungryong Kim
Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models
Weifeng Chen, Yatai Ji, Jie Wu, Hefeng Wu, Pan Xie, Jiashi Li, Xin Xia, Xuefeng Xiao, Liang Lin