High Performance Computing System

High-performance computing (HPC) systems are crucial for tackling computationally intensive scientific problems and increasingly, for powering large-scale AI applications. Current research focuses on optimizing HPC system design for AI workloads, including efficient resource allocation, novel cooling strategies (like liquid cooling), and improved performance modeling for various AI architectures (e.g., transformers, large language models). This involves developing advanced techniques like machine learning-driven auto-tuning, optimized communication frameworks, and efficient I/O management to address the unique demands of AI and scientific simulations, ultimately enabling faster, more energy-efficient, and reliable computations for a wide range of scientific disciplines.

Papers