Model Checkpoint

Model checkpointing, the process of saving intermediate states of a model during training, is crucial for large language models (LLMs) and other deep learning models, enabling fault tolerance, efficient hyperparameter optimization, and facilitating model reuse and merging. Current research focuses on optimizing checkpointing efficiency for various architectures, including Mixture-of-Experts (MoE) models, through techniques like partial checkpointing, asynchronous saving, and compression. These advancements are vital for reducing the substantial computational and storage costs associated with training and deploying increasingly large models, impacting both research reproducibility and practical applications in various fields.

Papers

September 11, 2023

PAI-Diffusion: Constructing and Serving a Family of Open Chinese Diffusion Models for Text-to-image Synthesis on the Cloud
Chengyu Wang, Zhongjie Duan, Bingyan Liu, Xinyi Zou, Cen Chen, Kui Jia, Jun Huang
Text to Image Synthesis Cloud Computing Model Checkpoint

August 5, 2023

ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
Fangyuan Wang, Ming Hao, Yuhai Shi, Bo Xu
Automatic Speech Recognition Early Stopping Model Checkpoint Bias Variance ASR Foundation Model

June 5, 2023

Early Weight Averaging meets High Learning Rates for LLM Pre-training
Sunny Sanyal, Atula Neerkaje, Jean Kaddour, Abhishek Kumar, Sujay Sanghavi
Model Convergence Model Checkpoint LLM Training Strong Learner WEight AVERaging

May 26, 2023

Free Lunch: Robust Cross-Lingual Transfer via Model Checkpoint Averaging
Fabian David Schmidt, Ivan Vulić, Goran Glavaš
Cross Lingual Transfer Multilingual Language Model Free Lunch Target Language Language Data Model Checkpoint Semantic Task

February 6, 2023

Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
Yuliang Liu, Shenggui Li, Jiarui Fang, Yanjun Shao, Boyuan Yao, Yang You
Large Scale Model Model Checkpoint Pipeline Parallelism Unified Library Tensor Parallelism Intermediate Checkpoint

December 19, 2022

XEngine: Optimal Tensor Rematerialization for Neural Networks in Heterogeneous Environments
Manuela Schuler, Richard Membarth, Philipp Slusallek
Neural Network Back Propagation Model Checkpoint Deep Learning Network Heterogeneous Environment Memory Efficiency Forward Gradient

December 9, 2022

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Aran Komatsuzaki, Joan Puigcerver, James Lee-Thorp, Carlos Riquelme Ruiz, Basil Mustafa, Joshua Ainslie, Yi Tay, Mostafa Dehghani, Neil Houlsby
Many Sparse Sparse Model Model Checkpoint Free Counterpart Dense Model Five Dollar Model

November 22, 2022

Self-Ensemble Protection: Training Checkpoints Are Good Data Protectors
Sizhe Chen, Geng Yuan, Xinwen Cheng, Yifan Gong, Minghai Qin, Yanzhi Wang, Xiaolin Huang
DNN Training Model Checkpoint Model Gradient Ensemble Defense Intermediate Checkpoint

October 21, 2022

Revisiting Checkpoint Averaging for Neural Machine Translation
Yingbo Gao, Christian Herold, Zijian Yang, Hermann Ney
Neural Machine Translation Neural Machine Translation Model Model Checkpoint

October 12, 2022

Efficient Knowledge Distillation from Model Checkpoints
Chaofei Wang, Qisen Yang, Rui Huang, Shiji Song, Gao Huang
Knowledge Distillation Mutual Information Information Bottleneck Model Checkpoint Effective Distillation Teacher Selection

October 4, 2022

Recycling Scraps: Improving Private Learning by Leveraging Intermediate Checkpoints
Virat Shejwalkar, Arun Ganesh, Rajiv Mathews, Yarong Mu, Shuang Song, Om Thakkar, Abhradeep Thakurta, Xinyi Zheng
Private Learning Model Checkpoint Private Machine Learning Advanced Textile Recycling Intermediate Checkpoint

September 29, 2022

Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging
Jean Kaddour
Language Model Supervised ImageNet Time Matter Faster Convergence Large Datasets Model Checkpoint WEight AVERaging BERT Variant Multiple Day

August 15, 2022

Three New Validators and a Large-Scale Benchmark Ranking for Unsupervised Domain Adaptation
Kevin Musgrave, Serge Belongie, Ser-Nam Lim
Domain Adaptation Large Scale Unsupervised Domain Adaptation Hyperparameter Tuning Related Hyperparameters Model Accuracy Model Checkpoint Skilled Non Expert Validators

May 25, 2022

Re-Examining Calibration: The Case of Question Answering
Chenglei Si, Chen Zhao, Sewon Min, Jordan Boyd-Graber
Question Answering Case Relevance Model Checkpoint Calibration Metric Consistent Prediction Global Calibration Calibration Measure

March 14, 2022

CheckSel: Efficient and Accurate Data-valuation Through Online Checkpoint Selection
Soumi Das, Manasvi Sagarkar, Suparna Bhattacharya, Sourangshu Bhattacharya
Domain Adaptation High Efficiency Target Domain Source Domain Model Checkpoint

February 25, 2022

Asyncval: A Toolkit for Asynchronously Validating Dense Retriever Checkpoints during Training
Shengyao Zhuang, Guido Zuccon
Training Data Pre Trained Easy to Use Toolkit Dense Retriever Model Checkpoint Validation Performance

Model Checkpoint

Papers

PAI-Diffusion: Constructing and Serving a Family of Open Chinese Diffusion Models for Text-to-image Synthesis on the Cloud

ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging

Early Weight Averaging meets High Learning Rates for LLM Pre-training

Free Lunch: Robust Cross-Lingual Transfer via Model Checkpoint Averaging

Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models

XEngine: Optimal Tensor Rematerialization for Neural Networks in Heterogeneous Environments

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Self-Ensemble Protection: Training Checkpoints Are Good Data Protectors

Revisiting Checkpoint Averaging for Neural Machine Translation

Efficient Knowledge Distillation from Model Checkpoints

Recycling Scraps: Improving Private Learning by Leveraging Intermediate Checkpoints

Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging

Three New Validators and a Large-Scale Benchmark Ranking for Unsupervised Domain Adaptation

Re-Examining Calibration: The Case of Question Answering

CheckSel: Efficient and Accurate Data-valuation Through Online Checkpoint Selection

Asyncval: A Toolkit for Asynchronously Validating Dense Retriever Checkpoints during Training