Paper ID: 2202.01897

AtmoDist: Self-supervised Representation Learning for Atmospheric Dynamics

Sebastian Hoffmann, Christian Lessig

Representation learning has proven to be a powerful methodology in a wide variety of machine learning applications. For atmospheric dynamics, however, it has so far not been considered, arguably due to the lack of large-scale, labeled datasets that could be used for training. In this work, we show that the difficulty is benign and introduce a self-supervised learning task that defines a categorical loss for a wide variety of unlabeled atmospheric datasets. Specifically, we train a neural network on the simple yet intricate task of predicting the temporal distance between atmospheric fields from distinct but nearby times. We demonstrate that training with this task on ERA5 reanalysis leads to internal representations capturing intrinsic aspects of atmospheric dynamics. We do so by introducing a data-driven distance metric for atmospheric states. When employed as a loss function in other machine learning applications, this Atmodist distance leads to improved results compared to the classical $\ell_2$-loss. For example, for downscaling one obtains higher resolution fields that match the true statistics more closely than previous approaches and for the interpolation of missing or occluded data the AtmoDist distance leads to results that contain more realistic fine scale features. Since it is derived from observational data, AtmoDist also provides a novel perspective on atmospheric predictability.

Submitted: Feb 2, 2022