Model Merging
Model merging combines multiple pre-trained or fine-tuned neural networks, often large language models (LLMs) or transformers, into a single, more capable model without retraining on original datasets. Current research focuses on improving merging techniques, particularly addressing parameter conflicts and efficiently handling diverse model architectures and scales, exploring methods like weight averaging, task arithmetic, and parameter competition balancing. This approach offers significant advantages, including reduced storage and computational costs, improved generalization, and the ability to integrate expertise from various sources, impacting both the efficiency of model development and the performance of downstream applications.
Papers
January 2, 2025
December 29, 2024
December 22, 2024
December 20, 2024
December 18, 2024
December 11, 2024
December 9, 2024
December 5, 2024
November 26, 2024
November 25, 2024
November 24, 2024
November 23, 2024
November 22, 2024
November 15, 2024
November 14, 2024
November 5, 2024