Audio Visual Speech Separation
Audio-visual speech separation (AVSS) aims to isolate individual voices from a mixture using both audio and visual information, improving upon audio-only methods, particularly in noisy or multi-speaker environments. Current research focuses on developing robust models that handle missing or noisy visual cues, employing techniques like attention mechanisms, diffusion models, and efficient architectures (e.g., transformer-based networks) to achieve accurate and computationally efficient separation. These advancements have significant implications for applications such as speech recognition, meeting transcription, and assistive technologies by enhancing the robustness and accuracy of speech processing in real-world scenarios.
Papers
July 27, 2024
March 27, 2024
January 25, 2024
October 30, 2023
August 16, 2023
May 31, 2023
December 21, 2022
July 4, 2022
March 8, 2022