Remote Sensing Visual Question Answering

Remote sensing visual question answering (RSVQA) aims to enable natural language interaction with satellite imagery, automatically extracting information and providing textual answers to user queries. Current research focuses on improving model accuracy and robustness through techniques like attention mechanisms guided by image segmentation, the development of large vision-language models specifically trained on remote sensing data, and addressing inherent language biases in datasets and models. These advancements are significant for facilitating more accessible and efficient analysis of Earth observation data, with applications ranging from urban planning to environmental monitoring.

Papers