Visual Question
Visual Question Answering (VQA) aims to develop systems that can accurately answer natural language questions about the content of images or videos. Current research focuses on improving model robustness and accuracy, particularly for complex questions requiring spatial reasoning, multi-modal fusion (combining visual and textual information), and handling diverse question types, often employing large language models (LLMs) and vision transformers (ViTs) within various architectures. The field's significance lies in its potential for applications ranging from assisting visually impaired individuals to enhancing medical diagnosis and autonomous driving, driving advancements in multimodal learning and reasoning.
Papers
June 11, 2023
May 7, 2023
April 19, 2023
April 7, 2023
April 4, 2023
March 14, 2023
March 13, 2023
March 10, 2023
March 8, 2023
February 25, 2023
February 23, 2023
December 16, 2022
December 14, 2022
December 3, 2022
November 24, 2022
October 26, 2022
October 24, 2022
October 10, 2022
August 17, 2022