Compressed Stochastic Gradient Descent

Compressed Stochastic Gradient Descent (SGD) aims to accelerate distributed machine learning by reducing the communication overhead inherent in transmitting large gradients during model training. Current research focuses on developing adaptive compression techniques, such as quantization, sparsification, and low-rank approximations, often incorporating error feedback mechanisms and adaptive step-size methods to maintain accuracy despite compression. These advancements are crucial for scaling up training of large models on distributed systems, improving efficiency and reducing the time and energy costs associated with training complex machine learning models.

Papers