Silent Video

Silent video research focuses on automatically generating realistic and synchronized audio for videos lacking sound, addressing applications from silent film restoration to assistive technologies. Current approaches leverage various deep learning architectures, including sequence-to-sequence models, transformers, and diffusion models, often incorporating text-to-audio components or pre-trained lip-reading networks to improve audio quality and alignment with visual content. This field significantly impacts media production, accessibility for individuals with speech impairments, and broader AI research by advancing audio-visual synthesis and understanding.

Papers