Model Merging
Model merging combines multiple pre-trained or fine-tuned neural networks, often large language models (LLMs) or transformers, into a single, more capable model without retraining on original datasets. Current research focuses on improving merging techniques, particularly addressing parameter conflicts and efficiently handling diverse model architectures and scales, exploring methods like weight averaging, task arithmetic, and parameter competition balancing. This approach offers significant advantages, including reduced storage and computational costs, improved generalization, and the ability to integrate expertise from various sources, impacting both the efficiency of model development and the performance of downstream applications.
Papers
March 1, 2024
February 6, 2024
February 5, 2024
December 25, 2023
December 11, 2023
December 7, 2023
November 22, 2023
October 19, 2023
October 4, 2023
September 21, 2023
June 9, 2023
June 2, 2023
May 4, 2023
April 28, 2023
February 25, 2023
December 19, 2022
September 11, 2022
January 19, 2022