Shot Pruning
Shot pruning aims to efficiently reduce the size and computational cost of large neural networks, particularly in large language models (LLMs) and diffusion models, without significant performance loss. Current research focuses on developing one-shot pruning methods that identify and remove unimportant weights or layers in a single pass, often employing novel pruning criteria based on weight magnitudes, gradients, or optimization-based approaches like bi-level optimization. These techniques offer significant potential for deploying large models on resource-constrained devices and accelerating inference times, impacting both the efficiency of AI research and the accessibility of advanced AI applications.
Papers
LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models
Yupeng Su, Ziyi Guan, Xiaoqun Liu, Tianlai Jin, Dongkuan Wu, Graziano Chesi, Ngai Wong, Hao Yu
Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse Mechanism
Guanchen Li, Xiandong Zhao, Lian Liu, Zeping Li, Dong Li, Lu Tian, Jie He, Ashish Sirasao, Emad Barsoum