Paper ID: 2211.04655

Variants of SGD for Lipschitz Continuous Loss Functions in Low-Precision Environments

Michael R. Metel

Motivated by neural network training in low-precision arithmetic environments, this work studies the convergence of variants of SGD using adaptive step sizes with computational error. Considering a general stochastic Lipschitz continuous loss function, an asymptotic convergence result to a Clarke stationary point is proven as well as the non-asymptotic convergence to an approximate stationary point. It is assumed that only an approximation of the loss function's stochastic gradient can be computed in addition to error in computing the SGD step itself. Different variants of SGD are tested empirically, where improved test set accuracy is observed compared to SGD for two image recognition tasks.

Submitted: Nov 9, 2022