Layer Wise Pruning
Layer-wise pruning is a neural network compression technique aiming to reduce model size and computational cost by selectively removing less important layers or neurons within layers. Current research focuses on developing efficient algorithms for determining which layers to prune, including methods based on information theory, correlation analysis, and optimization techniques like ADMM, often applied to large language models (LLMs) and convolutional neural networks (CNNs). This approach offers significant potential for deploying large models on resource-constrained devices and improving the efficiency of federated learning, while maintaining or even improving accuracy in some cases.
Papers
November 12, 2024
November 5, 2024
October 14, 2024
September 27, 2024
August 14, 2024
July 19, 2024
March 28, 2024
February 18, 2024
September 21, 2023
July 6, 2023
March 11, 2023