Paper ID: 2205.13963

Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications

Ayesha Afzal, Georg Hager, Gerhard Wellein, Stefano Markidis

This paper studies the utility of using data analytics and machine learning techniques for identifying, classifying, and characterizing the dynamics of large-scale parallel (MPI) programs. To this end, we run microbenchmarks and realistic proxy applications with the regular compute-communicate structure on two different supercomputing platforms and choose the per-process performance and MPI time per time step as relevant observables. Using principal component analysis, clustering techniques, correlation functions, and a new "phase space plot," we show how desynchronization patterns (or lack thereof) can be readily identified from a data set that is much smaller than a full MPI trace. Our methods also lead the way towards a more general classification of parallel program dynamics.

Submitted: May 27, 2022