Hessian Matrix
The Hessian matrix, a matrix of second-order partial derivatives, is crucial in optimization algorithms for machine learning and other fields, primarily for accelerating convergence and improving model accuracy. Current research focuses on efficient Hessian approximation techniques, particularly for large-scale problems, exploring methods like Kronecker product approximations and diagonal approximations within algorithms such as Shampoo, Adam, and various quasi-Newton methods. This work is significant because efficient Hessian utilization enables faster training of complex models, improves the accuracy of optimization in challenging scenarios like saddle point problems and bilevel optimization, and facilitates better uncertainty quantification in Bayesian approaches.
Papers
Adaptive multiple optimal learning factors for neural network training
Jeshwanth Challagundla
Adaptive and Optimal Second-order Optimistic Methods for Minimax Optimization
Ruichen Jiang, Ali Kavis, Qiujiang Jin, Sujay Sanghavi, Aryan Mokhtari
ODE-based Learning to Optimize
Zhonglin Xie, Wotao Yin, Zaiwen Wen