Model Merging
Model merging combines multiple pre-trained or fine-tuned neural networks, often large language models (LLMs) or transformers, into a single, more capable model without retraining on original datasets. Current research focuses on improving merging techniques, particularly addressing parameter conflicts and efficiently handling diverse model architectures and scales, exploring methods like weight averaging, task arithmetic, and parameter competition balancing. This approach offers significant advantages, including reduced storage and computational costs, improved generalization, and the ability to integrate expertise from various sources, impacting both the efficiency of model development and the performance of downstream applications.
Papers
July 2, 2024
July 1, 2024
June 30, 2024
June 29, 2024
June 21, 2024
June 20, 2024
June 17, 2024
June 12, 2024
June 11, 2024
May 23, 2024
May 13, 2024
April 22, 2024
April 8, 2024
March 20, 2024
March 19, 2024
March 18, 2024
March 14, 2024
March 5, 2024