Layer Wise Pruning

Layer-wise pruning is a neural network compression technique aiming to reduce model size and computational cost by selectively removing less important layers or neurons within layers. Current research focuses on developing efficient algorithms for determining which layers to prune, including methods based on information theory, correlation analysis, and optimization techniques like ADMM, often applied to large language models (LLMs) and convolutional neural networks (CNNs). This approach offers significant potential for deploying large models on resource-constrained devices and improving the efficiency of federated learning, while maintaining or even improving accuracy in some cases.

Papers