Talking Face Generation

Talking face generation aims to synthesize realistic and synchronized videos of a person speaking, given only an audio recording and potentially a single image of their face. Current research focuses on improving lip synchronization accuracy, generating natural head movements and facial expressions (including emotions), and enhancing the overall realism and visual quality of the generated videos, often employing diffusion models, GANs, and transformer-based architectures. These advancements have implications for various fields, including virtual assistants, video conferencing, and healthcare applications like creating personalized avatars for Alzheimer's patients.

Papers