Video Portrait

Video portrait generation aims to create realistic, talking head videos from limited input, such as a single image or video, often driven by audio. Current research focuses on improving realism and control, employing techniques like neural radiance fields (NeRFs), diffusion models, and 3D morphable models (3DMMs) to generate high-fidelity videos with accurate lip-sync, expressive facial movements, and controllable emotions. These advancements are significant for applications in digital avatars, animation, and video editing, offering powerful tools for personalized content creation and manipulation. The field is also exploring efficient architectures to balance quality with speed, enabling real-time applications.

Papers