CLEVR X Dataset

CLEVR-X is a large-scale visual reasoning dataset designed to advance research in visual question answering (VQA) by incorporating natural language explanations for each image-question pair. Current research focuses on generating these explanations using state-of-the-art natural language generation models, often fine-tuned on the dataset or prompted on large language models, and analyzing the quality of generated explanations across various question and answer types. This dataset's significance lies in its ability to facilitate the development of more explainable and robust VQA systems, contributing to a deeper understanding of visual reasoning and its applications in fields like computer vision and artificial intelligence. The availability of structured explanations allows for more rigorous evaluation and comparison of different models.

Papers

April 16, 2024