Audio Visual Cue
Audio-visual cue research focuses on integrating auditory and visual information to improve various applications, primarily by leveraging the complementary strengths of each modality to overcome individual limitations. Current research emphasizes developing models, often based on transformer architectures, that effectively fuse audio and visual data for tasks such as scene understanding, object segmentation, and speaker identification. This work holds significant implications for diverse fields, including extended reality, video analysis, and even mental health assessment, by enabling more robust and accurate systems that surpass the capabilities of unimodal approaches.
Papers
October 24, 2024
August 18, 2024
July 30, 2024
July 7, 2024
July 1, 2024
May 12, 2024
February 4, 2024
January 22, 2024
May 22, 2023
January 4, 2023
November 1, 2022
February 25, 2022
January 30, 2022