Ok Vqa

Ok-VQA, or outside-knowledge visual question answering, focuses on developing systems that can answer complex questions about images by accessing and integrating external knowledge. Current research emphasizes efficient methods for retrieving and incorporating this knowledge, exploring approaches like dense passage retrieval and prompting large language models (LLMs) with image-derived text. These advancements aim to improve the accuracy and interpretability of VQA systems, bridging the gap between image understanding and complex reasoning requiring external information. The field's progress has significant implications for applications requiring robust visual understanding and knowledge integration, such as advanced search engines and intelligent assistants.

Papers

April 22, 2024

Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering
Dongze Hao, Qunbo Wang, Longteng Guo, Jie Jiang, Jing Liu
Question Answering Pre Trained Vision Language Model Visual Language Model Dense Passage Retrieval Knowledge Selection VQA Benchmark Ok Vqa

November 20, 2023

Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions
Ziyue Wang, Chi Chen, Peng Li, Yang Liu
Yes No Question Vision Language Task 3d Vqa Visual Gap Ok Vqa

October 20, 2023

A Simple Baseline for Knowledge-Based Visual Question Answering
Alexandros Xenos, Themos Stafylakis, Ioannis Patras, Georgios Tzimiropoulos
Knowledge Based Visual Question Answering VQA Dataset Efficient in Context Learning Ok Vqa

June 28, 2023

Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering
Alireza Salemi, Mahta Rafiee, Hamed Zamani
Multi Modal Dense Retriever Knowledge Based Visual Question Answering Zero Shot Retrieval Knowledge Based Visual Question VQA Task Ok Vqa

May 24, 2023

Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering
Xingyu Fu, Ben Zhou, Sihao Chen, Mark Yatskar, Dan Roth
Visual Question Answering Concept Bottleneck Model Reasoning Question Human Understandable Explanation Ok Vqa Bottleneck Model

April 13, 2023

PDFVQA: A New Dataset for Real-World VQA on PDF Documents
Yihao Ding, Siwen Luo, Hyunsuk Chung, Soyeon Caren Han
Visual Question Answering Document Understanding PDF Document VQA Datasets Ok Vqa

February 15, 2022

Delving Deeper into Cross-lingual Visual Question Answering
Chen Liu, Jonas Pfeiffer, Anna Korhonen, Ivan Vulić, Iryna Gurevych
Visual Question Answering 3d Vqa Multi Modal Transformer Deep Depth Ok Vqa

January 14, 2022

A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering
Feng Gao, Qing Ping, Govind Thattai, Aishwarya Reganti, Ying Nian Wu, Prem Natarajan
Word List Knowledge Based Visual Question Answering Image to Text Web Screenshots Generative Question Answering VQA Task Textual Knowledge Ok Vqa

Ok Vqa

Papers

Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering

Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions

A Simple Baseline for Knowledge-Based Visual Question Answering

Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering

Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering

PDFVQA: A New Dataset for Real-World VQA on PDF Documents

Delving Deeper into Cross-lingual Visual Question Answering

A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering