Knowledge Based VQA
Knowledge-based Visual Question Answering (KB-VQA) aims to develop systems that can answer questions about images by leveraging external knowledge sources, going beyond simple image recognition. Current research focuses on improving the integration of visual and textual information, often employing retrieval-augmented generation (RAG) frameworks and hierarchical transformer architectures to handle complex, multi-page documents and incorporate diverse knowledge types. These advancements are crucial for building more robust and reliable VQA systems with applications in diverse fields like medical image analysis and document understanding, ultimately improving human-computer interaction and information access.
Papers
October 28, 2024
July 17, 2024
June 21, 2024
March 23, 2024
March 15, 2024
January 7, 2024
November 21, 2023
November 13, 2023
May 10, 2023
December 7, 2022