Human Video Generation

Human video generation focuses on creating realistic and controllable videos of humans using artificial intelligence. Current research heavily utilizes diffusion models, often incorporating techniques like pose-driven animation, audio conditioning, and multi-modal inputs (text, audio, pose) to achieve greater realism and control over generated videos. This field is significant for its potential applications in film, gaming, virtual reality, and robotics, particularly in enabling more efficient data collection and generalization for robot manipulation tasks. The ongoing challenge lies in balancing photorealism, consistent identity preservation, and the ability to generate long, coherent video sequences with diverse and complex motions.

Papers