Paper ID: 2112.02829
SyntEO: Synthetic Data Set Generation for Earth Observation and Deep Learning -- Demonstrated for Offshore Wind Farm Detection
Thorsten Hoeser, Claudia Kuenzer
With the emergence of deep learning in the last years, new opportunities arose in Earth observation research. Nevertheless, they also brought with them new challenges. The data-hungry training processes of deep learning models demand large, resource expensive, annotated data sets and partly replaced knowledge-driven approaches so that model behaviour and the final prediction process became a black box. The proposed SyntEO approach enables Earth observation researchers to automatically generate large deep learning ready data sets by merging existing and procedural data. SyntEO does this by including expert knowledge in the data generation process in a highly structured manner to control the automatic image and label generation by employing an ontology. In this way, fully controllable experiment environments are set up, which support insights in the model training on the synthetic data sets. Thus, SyntEO makes the learning process approachable, which is an important cornerstone for explainable machine learning. We demonstrate the SyntEO approach by predicting offshore wind farms in Sentinel-1 images on two of the worlds largest offshore wind energy production sites. The largest generated data set has 90,000 training examples. A basic convolutional neural network for object detection, that is only trained on this synthetic data, confidently detects offshore wind farms by minimising false detections in challenging environments. In addition, four sequential data sets are generated, demonstrating how the SyntEO approach can precisely define the data set structure and influence the training process. SyntEO is thus a hybrid approach that creates an interface between expert knowledge and data-driven image analysis.
Submitted: Dec 6, 2021