Performance Bottleneck

Performance bottlenecks in various computational tasks, from large language model training to distributed machine learning, hinder efficiency and scalability. Current research focuses on identifying and mitigating these bottlenecks across different layers, including hardware (GPUs, TPUs, CPUs), software (optimizers, data pipelines), and algorithmic design (e.g., parallelization strategies, quantization techniques). Understanding and addressing these limitations is crucial for advancing machine learning, accelerating scientific discovery, and enabling the development of more efficient and powerful applications.

Papers

June 28, 2022

TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s
Felix Chern, Blake Hechtman, Andy Davis, Ruiqi Guo, David Majnemer, Sanjiv Kumar
Nearest Neighbor Performance Bottleneck Tensor Processing Unit GPU Implementation Accelerator Design K Nearest Neighbor Search KNN Clip

May 5, 2022

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training
Hanpeng Hu, Chenyu Jiang, Yuchen Zhong, Yanghua Peng, Chuan Wu, Yibo Zhu, Haibin Lin, Chuanxiong Guo
Deep Learning Framework DNN Model DNN Training Performance Bottleneck Dataset Specific Profiling

February 21, 2022

Enabling On-Device Smartphone GPU based Training: Lessons Learned
Anish Das, Young D. Kwon, Jagmohan Chauhan, Cecilia Mascolo
Deep Learning Training Data Critical Lesson Device Training Performance Bottleneck Mobile GPUs

February 9, 2022

Neural Architecture Search for Energy Efficient Always-on Audio Models
Daniel T. Speckhard, Karolis Misiunas, Sagi Perel, Tenghui Zhu, Simon Carlile, Malcolm Slaney
Neural Architecture Search Audio Classification Performance Bottleneck Continual Text Classification Energy Efficient Neural Network

November 7, 2021

Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines
Michael Kuchnik, Ana Klimovic, Jiri Simsa, Virginia Smith, George Amvrosiadis
Machine Learning Pipeline Performance Bottleneck Pipe Turn Input Data Pipeline

Performance Bottleneck

Papers

TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

Enabling On-Device Smartphone GPU based Training: Lessons Learned

Neural Architecture Search for Energy Efficient Always-on Audio Models

Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines