Model Growth

Model growth, the process of expanding a neural network's architecture during training, aims to improve model adaptability and efficiency while mitigating the problem of "forgetting" previously learned information. Current research focuses on developing strategies for efficient and controlled model growth, including techniques like sparse layer expansion and data-driven initialization, often applied to transformer-based models and other architectures such as convolutional neural networks. These advancements are significant because they address the high computational cost of training large models and improve the performance of continual learning and federated learning systems, leading to more efficient and adaptable AI systems across various applications.

Papers