Single GPU
Single-GPU computing remains a crucial area of research, focusing on optimizing the performance and energy efficiency of various machine learning tasks, including large language model (LLM) inference, image generation, and other computationally intensive algorithms. Current research emphasizes efficient memory management, novel attention mechanisms (like linear attention), and optimized kernel designs to maximize throughput and minimize latency, often targeting specific model architectures like Transformers and diffusion models. These advancements are significant because they enable cost-effective deployment of powerful AI models on readily available hardware, broadening access to advanced computational capabilities and accelerating progress across diverse scientific and industrial applications.
Papers
Differentiable Time-Frequency Scattering on GPU
John Muradeli, Cyrus Vahidi, Changhong Wang, Han Han, Vincent Lostanlen, Mathieu Lagrange, George Fazekas
Characterizing and Understanding Distributed GNN Training on GPUs
Haiyang Lin, Mingyu Yan, Xiaocheng Yang, Mo Zou, Wenming Li, Xiaochun Ye, Dongrui Fan
TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs
Yuke Wang, Boyuan Feng, Zheng Wang, Guyue Huang, Yufei Ding
Machine Learning Subsystem for Autonomous Collision Avoidance on a small UAS with Embedded GPU
Nicholas Polosky, Tyler Gwin, Sean Furman, Parth Barhanpurkar, Jithin Jagannath