Gradient Norm
Gradient norm, the magnitude of the gradient vector in optimization algorithms, is a central concept in deep learning research, with current efforts focusing on understanding its role in algorithm convergence, model robustness, and efficient training. Research investigates the impact of gradient norm on various optimization algorithms (e.g., Adam, SGD, RMSProp) and its relationship to model generalization and adversarial robustness, often within the context of specific architectures like vision transformers. Understanding and controlling gradient norm is crucial for improving the efficiency, stability, and reliability of deep learning models across diverse applications, from image classification to federated learning.
Papers
On the Geometry of Regularization in Adversarial Training: High-Dimensional Asymptotics and Generalization Bounds
Matteo Vilucchio, Nikolaos Tsilivis, Bruno Loureiro, Julia Kempe
Large Deviations and Improved Mean-squared Error Rates of Nonlinear SGD: Heavy-tailed Noise and Power of Symmetry
Aleksandar Armacki, Shuhua Yu, Dragana Bajovic, Dusan Jakovetic, Soummya Kar