Structured Pruning

Structured pruning is a model compression technique aiming to reduce the computational cost and memory footprint of deep neural networks (DNNs) by removing entire groups of parameters, such as neurons or filter channels, while preserving performance. Current research focuses on developing efficient algorithms for structured pruning across various architectures, including convolutional neural networks (CNNs), vision transformers (ViTs), and large language models (LLMs), often incorporating techniques like knowledge distillation and one-shot pruning to minimize retraining overhead. This work is significant because it enables the deployment of powerful DNNs on resource-constrained devices, improving the efficiency and accessibility of deep learning applications in diverse fields.

Papers

April 22, 2024

Rapid Deployment of DNNs for Edge Computing via Structured Pruning at Initialization
Bailey J. Eccles, Leon Wong, Blesson Varghese
Neural Network Machine Learning Deep Neural Network Edge Computing Structured Pruning New Initialization Unstructured Pruning Real Time Deployment Edge Machine Learning

April 18, 2024

LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights
Thibault Castells, Hyoung-Kyu Song, Bo-Kyeong Kim, Shinkook Choi
Generative Model Latent Diffusion Model Structured Pruning Task Agnostic LD Pruner

March 16, 2024

Efficient Pruning of Large Language Model with Adaptive Estimation Fusion
Jun Liu, Chao Wu, Changdi Yang, Hao Tang, Zhenglun Kong, Geng Yuan, Wei Niu, Dong Huang, Yanzhi Wang
Large Language Model Structured Pruning Based Pruning Multi Decoder Adaptive Space Fusion

March 12, 2024

Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons
Simon Dufort-Labbé, Pierluca D'Oro, Evgenii Nikishin, Razvan Pascanu, Pierre-Luc Bacon, Aristide Baratin
Single Neuron Level Structured Pruning Saturation Effect Dead Neuron Network Sparsity Sparsity Tradeoff

March 3, 2024

Structurally Prune Anything: Any Architecture, Any Framework, Any Time
Xun Wang, John Rachwan, Stephan Günnemann, Bertrand Charpentier
New Framework Time Matter Edge Pruning Architecture Design Structured Pruning Neural Network Pruning Pruning Performance Unstructured Pruning

March 2, 2024

OSSCAR: One-Shot Structured Pruning in Vision and Language Models with Combinatorial Optimization
Xiang Meng, Shibal Ibrahim, Kayhan Behdin, Hussein Hazimeh, Natalia Ponomareva, Rahul Mazumder
Language Model Vision Paper Edge Pruning Structured Pruning Inference Speedup High Pruning Regime

February 15, 2024

NutePrune: Efficient Progressive Pruning with Numerous Teachers for Large Language Models
Shengrui Li, Junzhe Chen, Xueting Han, Jing Bai
Knowledge Distillation Good Teacher Structured Pruning Progressive Pruning

February 8, 2024

Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes
Lucio Dery, Steven Kolawole, Jean-François Kagy, Virginia Smith, Graham Neubig, Ameet Talwalkar
Medical LLM Edge Pruning Structured Pruning Pruned Model Free Pruning

February 7, 2024

Compressing Deep Reinforcement Learning Networks with a Dynamic Structured Pruning Method for Autonomous Driving
Wensheng Su, Zhenni Li, Minrui Xu, Jiawen Kang, Dusit Niyato, Shengli Xie
Deep Reinforcement Learning Autonomous Driving Structured Pruning Neural Network Compression

January 22, 2024

APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
Bowen Zhao, Hannaneh Hajishirzi, Qingqing Cao
Large Language Model Scientific Inference Pretrained Language Model Efficient Training Structured Pruning Diverse Platform Inference Efficiency Large Language Model Inference Large Lm Adaptive Pruning

December 28, 2023

The LLM Surgeon
Tycho F. A. van der Ouderaa, Markus Nagel, Mart van Baalen, Yuki M. Asano, Tijmen Blankevoort
Large Language Model Large Corpus Structured Pruning Dataset Compression LLM Credence

December 19, 2023

Fluctuation-based Adaptive Structured Pruning for Large Language Models
Yongqi An, Xu Zhao, Tao Yu, Ming Tang, Jinqiao Wang
Structured Pruning Unstructured Pruning Pruning Framework

December 15, 2023

November 29, 2023

Towards Higher Ranks via Adversarial Weight Pruning
Yuchuan Tian, Hanting Chen, Tianyu Guo, Chao Xu, Yunhe Wang
Structured Pruning Network Pruning Unstructured Pruning Adversarial Pruning

November 24, 2023

CRISP: Hybrid Structured Sparsity for Class-aware Model Pruning
Shivam Aggarwal, Kuluhan Binici, Tulika Mitra
Structured Pruning Structured Sparsity Unstructured Pruning

November 17, 2023

Archtree: on-the-fly tree-structured exploration for latency-aware pruning of deep neural networks
Rémi Ouazan Reboul, Edouard Yvinec, Arnaud Dapogny, Kevin Bailly
Deep Neural Network Structured Pruning DNN Architecture Pruning Framework Rapidly Exploring Random Tree DNN Pruning

November 16, 2023

Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets
Arthur da Cunha, Francesco d'Amore, Emanuele Natale
Convolution Layer Structured Pruning Estimated Team Strength Lottery Ticket Lottery Ticket Hypothesis Unstructured Pruning Random Convolution Randomly Initialized

November 10, 2023

Transfer Learning for Structured Pruning under Limited Task Data
Lucio Dery, David Grangier, Awni Hannun
Transfer Learning Pre Trained Model Task Specific Pruning Method Structured Pruning Improved Generalization Data Scarce

November 3, 2023

A Structured Pruning Algorithm for Model-based Deep Learning
Chicago Park, Weijie Gan, Zihao Zou, Yuyang Hu, Zhixin Sun, Ulugbek S. Kamilov
Inverse Problem Structured Pruning Model Based Deep Learning

Structured Pruning

Papers

Rapid Deployment of DNNs for Edge Computing via Structured Pruning at Initialization

LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights

Efficient Pruning of Large Language Model with Adaptive Estimation Fusion

Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons

Structurally Prune Anything: Any Architecture, Any Framework, Any Time

OSSCAR: One-Shot Structured Pruning in Vision and Language Models with Combinatorial Optimization

NutePrune: Efficient Progressive Pruning with Numerous Teachers for Large Language Models

Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes

Compressing Deep Reinforcement Learning Networks with a Dynamic Structured Pruning Method for Autonomous Driving

APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference

The LLM Surgeon

Fluctuation-based Adaptive Structured Pruning for Large Language Models

Coupling Fairness and Pruning in a Single Run: a Bi-level Optimization Perspective

OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators

Towards Higher Ranks via Adversarial Weight Pruning

CRISP: Hybrid Structured Sparsity for Class-aware Model Pruning

Archtree: on-the-fly tree-structured exploration for latency-aware pruning of deep neural networks

Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets

Transfer Learning for Structured Pruning under Limited Task Data

A Structured Pruning Algorithm for Model-based Deep Learning