Heterogeneous GPU

Heterogeneous GPU computing focuses on efficiently utilizing diverse GPU types within a single system for tasks like large language model (LLM) training and inference. Current research emphasizes optimizing resource allocation and scheduling across heterogeneous hardware, employing techniques like max-flow algorithms, reinforcement learning for resource partitioning, and adaptive batch sizes in stochastic gradient descent. This work is crucial for reducing the cost and improving the performance of computationally intensive applications, particularly in AI and high-performance computing, by enabling the use of a wider range of available hardware resources.

Papers