Small Initialization
Small initialization in neural networks focuses on leveraging the impact of starting training with small weight values, aiming to improve training efficiency, convergence properties, and generalization performance. Current research investigates this effect across various architectures, including transformers and multilayer perceptrons, employing techniques like gradient descent and alternating minimization algorithms to analyze the dynamics and convergence behavior under different initialization schemes. These studies are significant because they offer insights into the implicit biases of training algorithms and can lead to more efficient and robust training methods for diverse machine learning tasks.
Papers
October 12, 2024
October 2, 2024
March 24, 2024
March 12, 2024
March 1, 2024
February 9, 2024
January 19, 2024
January 4, 2024
October 25, 2023
September 4, 2023
July 24, 2023
December 19, 2022
November 14, 2022
June 2, 2022