WEight AVERaging

Weight averaging (WA) is a technique used to improve the performance and generalization of deep learning models by combining the weights of multiple models trained independently or in parallel. Current research focuses on applying WA in diverse contexts, including domain generalization, continual learning, and adversarial training, often employing it within ensemble methods or as a component of novel optimization algorithms like stochastic weight averaging (SWA) and its variants. The effectiveness of WA stems from its ability to mitigate overfitting, enhance robustness, and accelerate training, leading to improved accuracy and efficiency across various applications, particularly in computer vision and natural language processing.

Papers