Continual Pre Training
Continual pre-training (CPT) aims to efficiently adapt large language models (LLMs) to new domains or tasks by incrementally training them on additional data, rather than retraining from scratch. Current research focuses on optimizing CPT strategies, including learning rate scheduling, data mixing ratios, and mitigating catastrophic forgetting, often employing models like Llama and LLaMA-2, and exploring techniques such as model merging and parameter-efficient updates. This approach is significant because it reduces the substantial computational cost associated with training LLMs from scratch, enabling more frequent updates and adaptation to evolving data and tasks across various domains, including finance, medicine, and astronomy.
Papers
January 6, 2024
December 25, 2023
December 20, 2023
November 15, 2023
November 14, 2023
November 11, 2023
October 22, 2023
October 19, 2023
October 3, 2023
September 13, 2023
August 23, 2023
August 8, 2023
July 26, 2023
June 11, 2023
June 8, 2023
June 3, 2023
May 27, 2023
May 23, 2023
May 15, 2023