Audio Visual Question Answering
Audio-Visual Question Answering (AVQA) aims to develop systems that can answer questions about videos by integrating both visual and auditory information. Current research focuses on improving the accuracy and robustness of AVQA models, particularly by addressing challenges like missing modalities, dataset biases, and efficient processing of long sequences, often employing advanced architectures such as transformer-based models, hyperbolic state spaces, and multimodal large language models. These advancements are significant for improving multimodal understanding in AI and have potential applications in areas such as video indexing, content summarization, and assistive technologies for the visually or hearing impaired.
Papers
November 7, 2024
July 30, 2024
July 23, 2024
June 14, 2024
June 13, 2024
June 11, 2024
May 13, 2024
April 18, 2024
March 7, 2024
December 20, 2023
October 25, 2023
August 10, 2023