Visual Dubbing
Visual dubbing aims to realistically synchronize lip movements in video with new audio, creating dubbed versions of films or other visual media. Current research focuses on developing data-efficient models, often employing two-stage architectures that separate lip synchronization from face rendering, and leveraging techniques like neural rendering priors, diffusion models, and attention mechanisms to improve both visual quality and the preservation of speaker identity. These advancements are significant for improving accessibility of media and potentially streamlining the dubbing process for various applications, including film production and video game localization.
Papers
September 9, 2024
January 11, 2024
November 3, 2023