Continual Pre Training
Continual pre-training (CPT) aims to efficiently adapt large language models (LLMs) to new domains or tasks by incrementally training them on additional data, rather than retraining from scratch. Current research focuses on optimizing CPT strategies, including learning rate scheduling, data mixing ratios, and mitigating catastrophic forgetting, often employing models like Llama and LLaMA-2, and exploring techniques such as model merging and parameter-efficient updates. This approach is significant because it reduces the substantial computational cost associated with training LLMs from scratch, enabling more frequent updates and adaptation to evolving data and tasks across various domains, including finance, medicine, and astronomy.
Papers
Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation
Shamane Siriwardhana, Mark McQuade, Thomas Gauthier, Lucas Atkins, Fernando Fernandes Neto, Luke Meyers, Anneketh Vij, Tyler Odenthal, Charles Goddard, Mary MacCarthy, Jacob Solawetz
Efficient Continual Pre-training by Mitigating the Stability Gap
Yiduo Guo, Jie Fu, Huishuai Zhang, Dongyan Zhao, Yikang Shen