Video Generation
Video generation research focuses on creating realistic and controllable videos from various inputs like text, images, or other videos. Current efforts center on improving model architectures, such as diffusion models and diffusion transformers, to enhance video quality, temporal consistency, and controllability, often incorporating techniques like vector quantization for efficiency. This field is crucial for advancing multimedia applications, including content creation, simulation, and autonomous driving, by providing tools to generate high-quality, diverse, and easily manipulated video data. Furthermore, ongoing research is addressing the limitations of existing evaluation metrics to better align assessments with human perception.
Papers
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
Fanqing Meng, Jiaqi Liao, Xinyu Tan, Wenqi Shao, Quanfeng Lu, Kaipeng Zhang, Yu Cheng, Dianqi Li, Yu Qiao, Ping Luo
The Dawn of Video Generation: Preliminary Explorations with SORA-like Models
Ailing Zeng, Yuhang Yang, Weidong Chen, Wei Liu
Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality
Ge Ya (Olga)Luo, Gian Favero, Zhi Hao Luo, Alexia Jolicoeur-Martineau, Christopher Pal
COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation
Mingzhen Sun, Weining Wang, Xinxin Zhu, Jing Liu
MM-LDM: Multi-Modal Latent Diffusion Model for Sounding Video Generation
Mingzhen Sun, Weining Wang, Yanyuan Qiao, Jiahui Sun, Zihan Qin, Longteng Guo, Xinxin Zhu, Jing Liu
DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures
Steven Hogue, Chenxu Zhang, Hamza Daruger, Yapeng Tian, Xiaohu Guo
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models
Haibo Yang, Yang Chen, Yingwei Pan, Ting Yao, Zhineng Chen, Chong-Wah Ngo, Tao Mei