Structured Pruning
Structured pruning is a model compression technique aiming to reduce the computational cost and memory footprint of deep neural networks (DNNs) by removing entire groups of parameters, such as neurons or filter channels, while preserving performance. Current research focuses on developing efficient algorithms for structured pruning across various architectures, including convolutional neural networks (CNNs), vision transformers (ViTs), and large language models (LLMs), often incorporating techniques like knowledge distillation and one-shot pruning to minimize retraining overhead. This work is significant because it enables the deployment of powerful DNNs on resource-constrained devices, improving the efficiency and accessibility of deep learning applications in diverse fields.
Papers
Coupling Fairness and Pruning in a Single Run: a Bi-level Optimization Perspective
Yucong Dai, Gen Li, Feng Luo, Xiaolong Ma, Yongkai Wu
OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators
Tianyi Chen, Tianyu Ding, Zhihui Zhu, Zeyu Chen, HsiangTao Wu, Ilya Zharkov, Luming Liang