Knowledge Based VQA

Knowledge-based Visual Question Answering (KB-VQA) aims to develop systems that can answer questions about images by leveraging external knowledge sources, going beyond simple image recognition. Current research focuses on improving the integration of visual and textual information, often employing retrieval-augmented generation (RAG) frameworks and hierarchical transformer architectures to handle complex, multi-page documents and incorporate diverse knowledge types. These advancements are crucial for building more robust and reliable VQA systems with applications in diverse fields like medical image analysis and document understanding, ultimately improving human-computer interaction and information access.

Papers