Paper ID: 2206.03299

Generalization Error Bounds for Deep Neural Networks Trained by SGD

Mingze Wang, Chao Ma

Generalization error bounds for deep neural networks trained by stochastic gradient descent (SGD) are derived by combining a dynamical control of an appropriate parameter norm and the Rademacher complexity estimate based on parameter norms. The bounds explicitly depend on the loss along the training trajectory, and work for a wide range of network architectures including multilayer perceptron (MLP) and convolutional neural networks (CNN). Compared with other algorithm-depending generalization estimates such as uniform stability-based bounds, our bounds do not require $L$-smoothness of the nonconvex loss function, and apply directly to SGD instead of Stochastic Langevin gradient descent (SGLD). Numerical results show that our bounds are non-vacuous and robust with the change of optimizer and network hyperparameters.

Submitted: Jun 7, 2022

Topics

Deep Neural Network
Gradient Descent
Stochastic Gradient Descent
Langevin Dynamic
Non Convex Loss Function
Generalization Error Bound
Algorithm Dependent Generalization

Links

arXiv PDF