Silent Video

Silent video research focuses on automatically generating realistic and synchronized audio for videos lacking sound, addressing applications from silent film restoration to assistive technologies. Current approaches leverage various deep learning architectures, including sequence-to-sequence models, transformers, and diffusion models, often incorporating text-to-audio components or pre-trained lip-reading networks to improve audio quality and alignment with visual content. This field significantly impacts media production, accessibility for individuals with speech impairments, and broader AI research by advancing audio-visual synthesis and understanding.

Papers

July 1, 2024

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds
Yiming Zhang, Yicheng Gu, Yanhong Zeng, Zhening Xing, Yuancheng Wang, Zhizheng Wu, Kai Chen
Audio Effect Video to Audio Generation Foley Sound Human Likeness Silent Video

April 25, 2024

Synthesizing Audio from Silent Video using Sequence to Sequence Modeling
Hugo Garrido-Lestache Belinchon, Helina Mulugeta, Adam Haile
Audio Visual Audio Driven Sequence of Sequence Video Generation Model Silent Video

March 2, 2024

Towards Accurate Lip-to-Speech Synthesis in-the-Wild
Sindhu Hegde, Rudrabha Mukhopadhyay, C. V. Jawahar, Vinay Namboodiri
Lip Movement Lip to Speech Synthesis Silent Video Lip to Speech

February 22, 2024

Learning an Actionable Discrete Diffusion Policy via Large-Scale Actionless Video Pre-Training
Haoran He, Chenjia Bai, Ling Pan, Weinan Zhang, Bin Zhao, Xuelong Li
Generative Pre Training Discrete Diffusion Efficient Policy Learning Robot Representation Silent Video

January 9, 2024

SonicVisionLM: Playing Sound with Vision Language Models
Zhifeng Xie, Shengye Yu, Qile He, Mengtian Li
Vision Language Model Audio Representation Sound Design Audio Effect Silent Video Audio Visual Generation

October 12, 2023

Learning to Act from Actionless Videos through Dense Correspondences
Po-Chen Ko, Jiayuan Mao, Yilun Du, Shao-Hua Sun, Joshua B. Tenenbaum
LeArning Abstract Dense Correspondence Robot Policy Task Agnostic Representation Face Act Robot Goal Silent Video Policy Reproducibility Efficient Video

August 29, 2023

Let There Be Sound: Reconstructing High Quality Speech from Silent Videos
Ji-Hoon Kim, Jaehun Kim, Joon Son Chung
Sound Design Self Supervised Speech Representation High Quality Speech Lip to Speech Synthesis Silent Video Lip to Speech

August 23, 2023

An Initial Exploration: Learning to Generate Realistic Audio for Silent Video
Matthew Martel, Jackson Wagner
LeArning Abstract Audio Generation Audio Effect Silent Video

June 5, 2023

LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
Yochai Yemini, Aviv Shamsian, Lior Bracha, Sharon Gannot, Ethan Fetaya
Speech Generation Natural Sounding Speech High Quality Speech Lip Reading Silent Video Lip to Speech

April 17, 2023

Conditional Generation of Audio from Video via Foley Analogies
Yuexi Du, Ziyang Chen, Justin Salamon, Bryan Russell, Andrew Owens
Source Video Audio Driven Conditional Generation Audio Effect Return Conditioned Supervised Learning Visual Analogy Silent Video