Peak Memory

Peak memory, the maximum memory usage during neural network execution, is a critical bottleneck limiting the deployment of large models on resource-constrained devices. Current research focuses on reducing peak memory in various architectures, including convolutional neural networks (CNNs), vision transformers (ViTs), and large language models (LLMs), employing techniques like model distillation, forward-mode auto-differentiation, and novel training strategies such as progressive training and weight splitting. These advancements enable efficient fine-tuning and adaptation of large models on edge devices and in federated learning settings, significantly impacting the feasibility of deploying AI in resource-limited environments. The resulting memory savings improve both training speed and the range of applications for powerful deep learning models.

Papers