Visual Question Answering
Visual Question Answering (VQA) aims to enable computers to answer questions about images, requiring sophisticated integration of visual and linguistic understanding. Current research emphasizes improving model robustness and reliability, focusing on addressing issues like inconsistencies in responses, hallucinations, and the handling of unanswerable questions, often using large multimodal language models (MLLMs) like BLIP-2 and LLaVA. This field is crucial for advancing AI's ability to interact with the world in a more human-like way, with applications ranging from assistive technologies for visually impaired individuals to medical image analysis and automated data visualization evaluation.
Papers
February 11, 2023
January 25, 2023
January 23, 2023
January 22, 2023
January 18, 2023
January 17, 2023
December 22, 2022
December 20, 2022
December 7, 2022
December 2, 2022
December 1, 2022
November 23, 2022
November 21, 2022
November 19, 2022
November 18, 2022
November 17, 2022
November 15, 2022
November 9, 2022