Transformer Pruning
Transformer pruning aims to reduce the computational cost and memory footprint of large transformer models without significantly sacrificing performance. Current research focuses on developing efficient one-shot pruning techniques that avoid computationally expensive retraining, employing methods like attention map analysis and gradient-based importance scoring to identify less crucial parameters for removal across various architectures, including Vision Transformers (ViTs) and BERT. These advancements are significant because they enable the deployment of powerful transformer models on resource-constrained devices like mobile phones and edge computing platforms, broadening their applicability in diverse fields.
Papers
March 26, 2024
March 19, 2024
October 3, 2023
April 4, 2023
June 29, 2022
June 25, 2022
March 30, 2022