Paper ID: 2210.11562

Local SGD in Overparameterized Linear Regression

Mike Nguyen, Charly Kirst, Nicole Mücke

We consider distributed learning using constant stepsize SGD (DSGD) over several devices, each sending a final model update to a central server. In a final step, the local estimates are aggregated. We prove in the setting of overparameterized linear regression general upper bounds with matching lower bounds and derive learning rates for specific data generating distributions. We show that the excess risk is of order of the variance provided the number of local nodes grows not too large with the global sample size. We further compare the sample complexity of DSGD with the sample complexity of distributed ridge regression (DRR) and show that the excess SGD-risk is smaller than the excess RR-risk, where both sample complexities are of the same order.

Submitted: Oct 20, 2022

Topics

Gradient Descent
Sample Complexity
Ridge Regression
Parameterized Model
Excess Risk
Local SGD

Links

arXiv PDF