Model Merging

Model merging combines multiple pre-trained or fine-tuned neural networks, often large language models (LLMs) or transformers, into a single, more capable model without retraining on original datasets. Current research focuses on improving merging techniques, particularly addressing parameter conflicts and efficiently handling diverse model architectures and scales, exploring methods like weight averaging, task arithmetic, and parameter competition balancing. This approach offers significant advantages, including reduced storage and computational costs, improved generalization, and the ability to integrate expertise from various sources, impacting both the efficiency of model development and the performance of downstream applications.

Papers