Video Generative

Video generative models aim to create realistic and diverse videos from various inputs, such as text descriptions or single images, focusing on improving efficiency, controllability, and multi-modal alignment. Current research emphasizes advancements in diffusion models, variational autoencoders (VAEs), and transformer-based architectures, often incorporating techniques like latent space compression and inter-frame motion consistency to enhance generation speed and quality. These advancements have implications for various fields, including video editing, content creation, robotics (through action-conditional generation), and even combating the spread of misinformation via improved detection and tracing of synthetic videos.

Papers