VQA System
Visual Question Answering (VQA) systems aim to enable computers to answer questions about images or videos using a combination of computer vision and natural language processing. Current research focuses on improving the robustness and consistency of VQA models across diverse data types (images, charts, videos, multi-page documents), addressing biases in training data, and enhancing the accuracy of answers, particularly for complex questions requiring reasoning and external knowledge. These advancements are crucial for applications ranging from medical image analysis and document understanding to robotics and augmented reality, where accurate and reliable interpretation of visual information is paramount.
Papers
August 24, 2022
August 10, 2022
May 30, 2022
March 15, 2022
February 15, 2022
February 4, 2022
December 13, 2021