Online Merging
Online merging focuses on combining multiple trained neural network models into a single, more powerful model, aiming to reduce resource consumption, improve generalization, and streamline model development. Current research emphasizes scaling merging techniques to larger models (e.g., transformers with billions of parameters) and exploring various merging methods, including averaging, least squares optimization, and specialized techniques like Foldable SuperNets. This area is significant because efficient model merging can improve the performance and cost-effectiveness of machine learning systems across diverse applications, from image processing and natural language processing to autonomous driving and program repair.
Papers
Merging in a Bottle: Differentiable Adaptive Merging (DAM) and the Path from Averaging to Automation
Thomas Gauthier-Caron, Shamane Siriwardhana, Elliot Stein, Malikeh Ehghaghi, Charles Goddard, Mark McQuade, Jacob Solawetz, Maxime Labonne
SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture
Jiayi Han, Liang Du, Hongwei Du, Xiangguo Zhou, Yiwen Wu, Weibo Zheng, Donghong Han