Paper ID: 2202.08402

Federated Stochastic Gradient Descent Begets Self-Induced Momentum

Howard H. Yang, Zuozhu Liu, Yaru Fu, Tony Q. S. Quek, H. Vincent Poor

Federated learning (FL) is an emerging machine learning method that can be applied in mobile edge systems, in which a server and a host of clients collaboratively train a statistical model utilizing the data and computation resources of the clients without directly exposing their privacy-sensitive data. We show that running stochastic gradient descent (SGD) in such a setting can be viewed as adding a momentum-like term to the global aggregation process. Based on this finding, we further analyze the convergence rate of a federated learning system by accounting for the effects of parameter staleness and communication resources. These results advance the understanding of the Federated SGD algorithm, and also forges a link between staleness analysis and federated computing systems, which can be useful for systems designers.

Submitted: Feb 17, 2022