Transformer Pruning

Transformer pruning aims to reduce the computational cost and memory footprint of large transformer models without significantly sacrificing performance. Current research focuses on developing efficient one-shot pruning techniques that avoid computationally expensive retraining, employing methods like attention map analysis and gradient-based importance scoring to identify less crucial parameters for removal across various architectures, including Vision Transformers (ViTs) and BERT. These advancements are significant because they enable the deployment of powerful transformer models on resource-constrained devices like mobile phones and edge computing platforms, broadening their applicability in diverse fields.

Papers