Talking Head

Talking head generation aims to synthesize realistic videos of a person speaking, driven by audio or text input, often from a single image. Current research focuses on improving realism through advancements in 3D model-based approaches (like NeRFs and 3DMMs), efficient neural networks (transformers and diffusion models), and techniques for precise lip synchronization and emotional expression control. These improvements are significant for applications ranging from video conferencing and virtual assistants to animation and special effects, driving advancements in both computer vision and graphics.

Papers