Model Merging
Model merging combines multiple pre-trained or fine-tuned neural networks, often large language models (LLMs) or transformers, into a single, more capable model without retraining on original datasets. Current research focuses on improving merging techniques, particularly addressing parameter conflicts and efficiently handling diverse model architectures and scales, exploring methods like weight averaging, task arithmetic, and parameter competition balancing. This approach offers significant advantages, including reduced storage and computational costs, improved generalization, and the ability to integrate expertise from various sources, impacting both the efficiency of model development and the performance of downstream applications.
Papers
Unconstrained Model Merging for Enhanced LLM Reasoning
Yiming Zhang, Baoyi He, Shengyu Zhang, Yuhao Fu, Qi Zhou, Zhijie Sang, Zijin Hong, Kejing Yang, Wenjun Wang, Jianbo Yuan, Guangning Han, Linyi Li, Chunlin Ji, Fei Wu, Hongxia Yang
Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace
Jinluan Yang, Anke Tang, Didi Zhu, Zhengyu Chen, Li Shen, Fei Wu
SoK: On Finding Common Ground in Loss Landscapes Using Deep Model Merging Techniques
Arham Khan, Todd Nief, Nathaniel Hudson, Mansi Sakarvadia, Daniel Grzenda, Aswathy Ajith, Jordan Pettyjohn, Kyle Chard, Ian Foster
The Non-Local Model Merging Problem: Permutation Symmetries and Variance Collapse
Ekansh Sharma, Daniel M. Roy, Gintare Karolina Dziugaite
Exploring Model Kinship for Merging Large Language Models
Yedi Hu, Yunzhi Yao, Ningyu Zhang, Shumin Deng, Huajun Chen
Tracking Universal Features Through Fine-Tuning and Model Merging
Niels Horn, Desmond Elliott