Paper ID: 2201.06147

Data augmentation through multivariate scenario forecasting in Data Centers using Generative Adversarial Networks

Jaime Pérez, Patricia Arroba, José M. Moya

The Cloud paradigm is at a critical point in which the existing energy-efficiency techniques are reaching a plateau, while the computing resources demand at Data Center facilities continues to increase exponentially. The main challenge in achieving a global energy efficiency strategy based on Artificial Intelligence is that we need massive amounts of data to feed the algorithms. This paper proposes a time-series data augmentation methodology based on synthetic scenario forecasting within the Data Center. For this purpose, we will implement a powerful generative algorithm: Generative Adversarial Networks (GANs). Specifically, our work combines the disciplines of GAN-based data augmentation and scenario forecasting, filling the gap in the generation of synthetic data in DCs. Furthermore, we propose a methodology to increase the variability and heterogeneity of the generated data by introducing on-demand anomalies without additional effort or expert knowledge. We also suggest the use of Kullback-Leibler Divergence and Mean Squared Error as new metrics in the validation of synthetic time series generation, as they provide a better overall comparison of multivariate data distributions. We validate our approach using real data collected in an operating Data Center, successfully generating synthetic data helpful for prediction and optimization models. Our research will help optimize the energy consumed in Data Centers, although the proposed methodology can be employed in any similar time-series-like problem.

Submitted: Jan 12, 2022