BERT Pruning

BERT pruning aims to reduce the size and computational cost of BERT-based language models while preserving accuracy. Current research focuses on developing efficient pruning algorithms, such as gradual magnitude pruning, and optimizing the pruning process through techniques like knowledge distillation and task-adaptive pre-training, often targeting specific model components like embeddings. These efforts are driven by the need to deploy large language models on resource-constrained devices and improve the efficiency of training and inference, impacting both edge AI applications and federated learning scenarios.

Papers