Visual Question
Visual Question Answering (VQA) aims to develop systems that can accurately answer natural language questions about the content of images or videos. Current research focuses on improving model robustness and accuracy, particularly for complex questions requiring spatial reasoning, multi-modal fusion (combining visual and textual information), and handling diverse question types, often employing large language models (LLMs) and vision transformers (ViTs) within various architectures. The field's significance lies in its potential for applications ranging from assisting visually impaired individuals to enhancing medical diagnosis and autonomous driving, driving advancements in multimodal learning and reasoning.
Papers
September 20, 2023
August 21, 2023
August 19, 2023
August 17, 2023
August 16, 2023
August 3, 2023
July 17, 2023
July 11, 2023
July 6, 2023
June 25, 2023
June 15, 2023
June 13, 2023
June 11, 2023
May 7, 2023
April 19, 2023
April 7, 2023
April 4, 2023
March 14, 2023
March 13, 2023
March 10, 2023