Small Initialization

Small initialization in neural networks focuses on leveraging the impact of starting training with small weight values, aiming to improve training efficiency, convergence properties, and generalization performance. Current research investigates this effect across various architectures, including transformers and multilayer perceptrons, employing techniques like gradient descent and alternating minimization algorithms to analyze the dynamics and convergence behavior under different initialization schemes. These studies are significant because they offer insights into the implicit biases of training algorithms and can lead to more efficient and robust training methods for diverse machine learning tasks.

Papers