Gradient Computation
Gradient computation is a fundamental process in training machine learning models, particularly large language models and deep neural networks, aiming to efficiently and accurately determine the direction of parameter updates to minimize loss functions. Current research focuses on improving the efficiency of gradient computation, particularly for large models, through techniques like approximating gradients in near-linear time, employing sampling methods to reduce energy consumption, and developing novel optimization algorithms that minimize communication overhead in distributed settings. These advancements are crucial for enabling the training and deployment of increasingly complex models, impacting various fields from natural language processing to computer vision and beyond.
Papers
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit
Jason D. Lee, Kazusato Oko, Taiji Suzuki, Denny Wu
ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training
Adel Nabli, Louis Fournier, Pierre Erbacher, Louis Serrano, Eugene Belilovsky, Edouard Oyallon