Paper ID: 2407.09548

Towards Temporal Change Explanations from Bi-Temporal Satellite Images

Ryo Tsujimoto, Hiroki Ouchi, Hidetaka Kamigaito, Taro Watanabe

Explaining temporal changes between satellite images taken at different times is important for urban planning and environmental monitoring. However, manual dataset construction for the task is costly, so human-AI collaboration is promissing. Toward the direction, in this paper, we investigate the ability of Large-scale Vision-Language Models (LVLMs) to explain temporal changes between satellite images. While LVLMs are known to generate good image captions, they receive only a single image as input. To deal with a par of satellite images as input, we propose three prompting methods. Through human evaluation, we found the effectiveness of our step-by-step reasoning based prompting.

Submitted: Jun 27, 2024