Kernel Performance

Kernel performance optimization is crucial for accelerating deep learning model training and inference, particularly for large language models (LLMs) and graph neural networks (GNNs). Current research focuses on developing efficient kernels through techniques like kernel fusion, weight quantization (including lookup table methods), and adaptive kernel selection based on data characteristics or subgraph density. These advancements aim to reduce computational bottlenecks stemming from memory access and communication overhead, ultimately leading to faster and more energy-efficient deep learning applications across various domains.

Papers