VQA System
Visual Question Answering (VQA) systems aim to enable computers to answer questions about images or videos using a combination of computer vision and natural language processing. Current research focuses on improving the robustness and consistency of VQA models across diverse data types (images, charts, videos, multi-page documents), addressing biases in training data, and enhancing the accuracy of answers, particularly for complex questions requiring reasoning and external knowledge. These advancements are crucial for applications ranging from medical image analysis and document understanding to robotics and augmented reality, where accurate and reliable interpretation of visual information is paramount.
Papers
November 5, 2024
October 23, 2024
October 17, 2024
September 14, 2024
July 15, 2024
June 14, 2024
April 29, 2024
April 1, 2024
February 16, 2024
February 14, 2024
December 20, 2023
September 4, 2023
July 21, 2023
June 8, 2023
May 31, 2023
May 24, 2023
March 9, 2023
January 12, 2023
October 10, 2022