Automatic Dubbing

Automatic dubbing aims to translate audio in videos while maintaining synchronization with the visuals, a complex task requiring precise timing and stylistic preservation. Current research focuses on improving neural machine translation models, often employing transformer architectures, to generate translations with durations closely matching the source audio, sometimes incorporating reinforcement learning or auxiliary timing information. This field is significant for bridging language barriers in media and improving accessibility to global content, driving advancements in both machine translation and speech synthesis technologies.

Papers