Model Growth
Model growth, the process of expanding a neural network's architecture during training, aims to improve model adaptability and efficiency while mitigating the problem of "forgetting" previously learned information. Current research focuses on developing strategies for efficient and controlled model growth, including techniques like sparse layer expansion and data-driven initialization, often applied to transformer-based models and other architectures such as convolutional neural networks. These advancements are significant because they address the high computational cost of training large models and improve the performance of continual learning and federated learning systems, leading to more efficient and adaptable AI systems across various applications.
Papers
Generating density nowcasts for U.S. GDP growth with deep learning: Bayes by Backprop and Monte Carlo dropout
Kristóf Németh, Dániel Hadházi
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
Wenyu Du, Tongxu Luo, Zihan Qiu, Zeyu Huang, Yikang Shen, Reynold Cheng, Yike Guo, Jie Fu