Paper ID: 2207.07689

Strict baselines for Covid-19 forecasting and ML perspective for USA and Russia

Alexander G. Sboev, Nikolay A. Kudryshov, Ivan A. Moloshnikov, Saveliy V. Zavertyaev, Aleksandr V. Naumov, Roman B. Rybka

Currently, the evolution of Covid-19 allows researchers to gather the datasets accumulated over 2 years and to use them in predictive analysis. In turn, this makes it possible to assess the efficiency potential of more complex predictive models, including neural networks with different forecast horizons. In this paper, we present the results of a consistent comparative study of different types of methods for predicting the dynamics of the spread of Covid-19 based on regional data for two countries: the United States and Russia. We used well-known statistical methods (e.g., Exponential Smoothing), a "tomorrow-as-today" approach, as well as a set of classic machine learning models trained on data from individual regions. Along with them, a neural network model based on Long short-term memory (LSTM) layers was considered, the training samples of which aggregate data from all regions of two countries: the United States and Russia. Efficiency evaluation was carried out using cross-validation according to the MAPE metric. It is shown that for complicated periods characterized by a large increase in the number of confirmed daily cases, the best results are shown by the LSTM model trained on all regions of both countries, showing an average Mean Absolute Percentage Error (MAPE) of 18%, 30%, 37% for Russia and 31%, 41%, 50% for US for predictions at forecast horizons of 14, 28, and 42 days, respectively.

Submitted: Jul 15, 2022