Continual Pre Training
Continual pre-training (CPT) aims to efficiently adapt large language models (LLMs) to new domains or tasks by incrementally training them on additional data, rather than retraining from scratch. Current research focuses on optimizing CPT strategies, including learning rate scheduling, data mixing ratios, and mitigating catastrophic forgetting, often employing models like Llama and LLaMA-2, and exploring techniques such as model merging and parameter-efficient updates. This approach is significant because it reduces the substantial computational cost associated with training LLMs from scratch, enabling more frequent updates and adaptation to evolving data and tasks across various domains, including finance, medicine, and astronomy.
Papers
February 9, 2023
February 7, 2023
November 21, 2022
October 19, 2022
June 14, 2022
May 19, 2022