Gradient Descent
Gradient descent is an iterative optimization algorithm used to find the minimum of a function by repeatedly taking steps proportional to the negative of the gradient. Current research focuses on improving its efficiency and robustness, particularly in high-dimensional spaces and with non-convex functions, exploring variations like stochastic gradient descent, proximal methods, and natural gradient descent, often within the context of deep learning models and other complex architectures. These advancements are crucial for training increasingly complex machine learning models and improving their performance in various applications, from image recognition to scientific simulations. A key area of investigation involves understanding and mitigating issues like vanishing/exploding gradients, overfitting, and the impact of data characteristics on convergence.
Papers - Page 36
Stability vs Implicit Bias of Gradient Methods on Separable Data and Beyond
Matan Schliserman, Tomer KorenBenign Underfitting of Stochastic Gradient Descent
Tomer Koren, Roi Livni, Yishay Mansour, Uri ShermanThinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization
Idan Amir, Roi Livni, Nathan Srebro
Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data
Spencer Frei, Niladri S. Chatterji, Peter L. BartlettThe Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention
Kazuki Irie, Róbert Csordás, Jürgen SchmidhuberA Modern Self-Referential Weight Matrix That Learns to Modify Itself
Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber