Visual Acoustic Matching
Visual acoustic matching (VAM) focuses on modifying an audio clip to sound as if recorded in a visually specified environment, aiming to improve audio realism and intelligibility. Recent research emphasizes self-supervised learning approaches, utilizing generative adversarial networks (GANs) and transformers, to overcome the limitations of paired training data and leverage readily available unpaired image-audio datasets. This work is driven by the need for more realistic and immersive audio experiences in applications like virtual and augmented reality, and advancements in VAM are contributing to improved audio processing techniques across various fields.
Papers
July 15, 2024
May 13, 2024
July 27, 2023