Progressive Pruning

Progressive pruning is a technique for reducing the size and computational cost of neural networks by iteratively removing less important connections or neurons during training or after pre-training. Current research focuses on developing efficient algorithms for various architectures, including large language models (LLMs) and vision-language transformers, often incorporating knowledge distillation or other techniques to mitigate performance loss during pruning. This approach is crucial for deploying large models on resource-constrained devices and improving the efficiency of training and inference, impacting fields like federated learning, continual learning, and edge computing. The goal is to achieve significant model compression while preserving or even enhancing accuracy.

Papers