Single GPU
Single-GPU computing remains a crucial area of research, focusing on optimizing the performance and energy efficiency of various machine learning tasks, including large language model (LLM) inference, image generation, and other computationally intensive algorithms. Current research emphasizes efficient memory management, novel attention mechanisms (like linear attention), and optimized kernel designs to maximize throughput and minimize latency, often targeting specific model architectures like Transformers and diffusion models. These advancements are significant because they enable cost-effective deployment of powerful AI models on readily available hardware, broadening access to advanced computational capabilities and accelerating progress across diverse scientific and industrial applications.
Papers
Architectural Implications of Embedding Dimension during GCN on CPU and GPU
Matthew Adiletta, David Brooks, Gu-Yeon Wei
Real-Time High-Quality Stereo Matching System on a GPU
Qiong Chang, Tsutomu Maruyama
Efficient stereo matching on embedded GPUs with zero-means cross correlation
Qiong Chang, Aolong Zha, Weimin Wang, Xin Liu, Masaki Onishi, Lei Lei, Meng Joo Er, Tsutomu Maruyama
Bottleneck Analysis of Dynamic Graph Neural Network Inference on CPU and GPU
Hanqiu Chen, Yahya Alhinai, Yihan Jiang, Eunjee Na, Cong Hao
Data-Efficiency with a Single GPU: An Exploration of Transfer Methods for Small Language Models
Alon Albalak, Akshat Shrivastava, Chinnadhurai Sankar, Adithya Sagar, Mike Ross