Audio Visual Scene
Audio-visual scene understanding aims to analyze and interpret scenes by integrating information from both audio and visual data streams, enabling computers to perceive environments more comprehensively than with a single modality. Current research focuses on developing robust models, often employing transformer architectures and graph convolutional networks, to effectively fuse audio and visual features for tasks like scene classification, segmentation, and question answering. This field is crucial for advancing applications such as content verification, robot navigation, and assistive technologies by providing machines with a richer understanding of their surroundings.
Papers
June 23, 2024
June 13, 2024
June 10, 2024
May 1, 2024
January 5, 2024
November 2, 2023
July 3, 2023
June 27, 2023
December 31, 2022
November 15, 2022
October 18, 2022
August 3, 2022
April 25, 2022
March 26, 2022
March 7, 2022