Activation Recomputation

Activation recomputation is a technique used in training large deep learning models to reduce memory consumption by recalculating intermediate activations instead of storing them. Current research focuses on optimizing recomputation strategies within transformer models, particularly aiming to minimize the computational overhead associated with this process through techniques like overlapping recomputation with communication and parallelization. These advancements are crucial for enabling the training of increasingly larger models, improving training speed and efficiency, and ultimately impacting the scalability and feasibility of deploying advanced AI systems.

Papers