Gradient Flow Dynamic
Gradient flow dynamics analyze the behavior of neural network weights during training, aiming to understand how optimization algorithms like gradient descent lead to successful learning. Current research focuses on characterizing this behavior in various architectures, including multi-head attention models, two-homogeneous networks, and ReLU networks, often examining convergence properties and implicit biases towards specific solutions. These studies shed light on fundamental aspects of neural network training, such as the emergence of task allocation in multi-head attention and the role of saddle points in optimization landscapes, ultimately contributing to the development of more efficient and robust training methods.
Papers
November 13, 2024
October 12, 2024
March 12, 2024
February 29, 2024
February 14, 2024
October 30, 2023
October 11, 2023
March 3, 2023
February 28, 2023
June 2, 2022