Video Generation
Video generation research focuses on creating realistic and controllable videos from various inputs like text, images, or other videos. Current efforts center on improving model architectures, such as diffusion models and diffusion transformers, to enhance video quality, temporal consistency, and controllability, often incorporating techniques like vector quantization for efficiency. This field is crucial for advancing multimedia applications, including content creation, simulation, and autonomous driving, by providing tools to generate high-quality, diverse, and easily manipulated video data. Furthermore, ongoing research is addressing the limitations of existing evaluation metrics to better align assessments with human perception.
Papers
Phenaki: Variable Length Video Generation From Open Domain Textual Description
Ruben Villegas, Mohammad Babaeizadeh, Pieter-Jan Kindermans, Hernan Moraldo, Han Zhang, Mohammad Taghi Saffar, Santiago Castro, Julius Kunze, Dumitru Erhan
Temporally Consistent Transformers for Video Generation
Wilson Yan, Danijar Hafner, Stephen James, Pieter Abbeel
Imagen Video: High Definition Video Generation with Diffusion Models
Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P. Kingma, Ben Poole, Mohammad Norouzi, David J. Fleet, Tim Salimans