Continual Pre Training

Continual pre-training (CPT) aims to efficiently adapt large language models (LLMs) to new domains or tasks by incrementally training them on additional data, rather than retraining from scratch. Current research focuses on optimizing CPT strategies, including learning rate scheduling, data mixing ratios, and mitigating catastrophic forgetting, often employing models like Llama and LLaMA-2, and exploring techniques such as model merging and parameter-efficient updates. This approach is significant because it reduces the substantial computational cost associated with training LLMs from scratch, enabling more frequent updates and adaptation to evolving data and tasks across various domains, including finance, medicine, and astronomy.

Papers