Layer Pruning
Layer pruning is a model compression technique aiming to improve the efficiency of deep neural networks by removing less important layers while preserving performance. Current research focuses on applying this method to various architectures, including transformers for natural language processing and speech recognition, convolutional neural networks for image processing, and diffusion models for image generation, employing diverse pruning criteria based on convexity, similarity metrics, or even stochastic re-initialization. This approach offers significant potential for reducing computational costs and memory requirements, making advanced deep learning models more accessible for resource-constrained environments and accelerating training and inference times.
Papers
SGLP: A Similarity Guided Fast Layer Partition Pruning for Compressing Large Deep Models
Yuqi Li, Yao Lu, Zeyu Dong, Chuanguang Yang, Yihao Chen, Jianping Gou
AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models
Haiquan Lu, Yefan Zhou, Shiwei Liu, Zhangyang Wang, Michael W. Mahoney, Yaoqing Yang