Cray XC40 Supercomputer
The Cray XC40 supercomputer, a powerful high-performance computing system, is the subject of ongoing research focused on optimizing its performance, efficiency, and reliability. Current research emphasizes developing digital twin frameworks for predictive modeling and resource management, employing artificial intelligence (AI) for workflow optimization and error mitigation, and characterizing the performance of its interconnects to maximize GPU utilization. These efforts aim to improve the energy efficiency and overall effectiveness of large-scale scientific computing, enabling advancements across diverse fields like materials science, biophysics, and AI model training itself.
Papers
Employing Artificial Intelligence to Steer Exascale Workflows with Colmena
Logan Ward, J. Gregory Pauloski, Valerie Hayot-Sasson, Yadu Babuji, Alexander Brace, Ryan Chard, Kyle Chard, Rajeev Thakur, Ian Foster
Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects
Daniele De Sensi, Lorenzo Pichetti, Flavio Vella, Tiziano De Matteis, Zebin Ren, Luigi Fusco, Matteo Turisini, Daniele Cesarini, Kurt Lust, Animesh Trivedi, Duncan Roweth, Filippo Spiga, Salvatore Di Girolamo, Torsten Hoefler