Kernel Performance
Kernel performance optimization is crucial for accelerating deep learning model training and inference, particularly for large language models (LLMs) and graph neural networks (GNNs). Current research focuses on developing efficient kernels through techniques like kernel fusion, weight quantization (including lookup table methods), and adaptive kernel selection based on data characteristics or subgraph density. These advancements aim to reduce computational bottlenecks stemming from memory access and communication overhead, ultimately leading to faster and more energy-efficient deep learning applications across various domains.
Papers
October 16, 2024
July 15, 2024
June 11, 2024
April 15, 2024
February 26, 2024
October 8, 2023
May 27, 2023
February 16, 2023
December 1, 2022
October 23, 2022
October 11, 2022
January 19, 2022