Gradient Norm
Gradient norm, the magnitude of the gradient vector in optimization algorithms, is a central concept in deep learning research, with current efforts focusing on understanding its role in algorithm convergence, model robustness, and efficient training. Research investigates the impact of gradient norm on various optimization algorithms (e.g., Adam, SGD, RMSProp) and its relationship to model generalization and adversarial robustness, often within the context of specific architectures like vision transformers. Understanding and controlling gradient norm is crucial for improving the efficiency, stability, and reliability of deep learning models across diverse applications, from image classification to federated learning.
Papers
Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum Minimization
Ali Kavis, Stratis Skoulakis, Kimon Antonakopoulos, Leello Tadesse Dadi, Volkan Cevher
Proximal Subgradient Norm Minimization of ISTA and FISTA
Bowen Li, Bin Shi, Ya-xiang Yuan
A Convergence Theory for Federated Average: Beyond Smoothness
Xiaoxiao Li, Zhao Song, Runzhou Tao, Guangyi Zhang