Sound Source Separation

Sound source separation aims to isolate individual sounds from a mixture, leveraging visual information to improve accuracy. Recent research focuses on developing robust models, including generative diffusion models and those employing predictive coding or attention mechanisms, to handle diverse sound categories and unseen instruments, often incorporating visual cues through various feature fusion strategies. This field is significant for advancing audio processing and multimedia technologies, with applications ranging from enhancing video conferencing to improving assistive listening devices and creating more realistic virtual environments. A key trend is moving beyond reliance on pre-trained object detectors and exploring self-supervised learning techniques to improve generalization and reduce annotation requirements.

Papers