Shot Pruning

Shot pruning aims to efficiently reduce the size and computational cost of large neural networks, particularly in large language models (LLMs) and diffusion models, without significant performance loss. Current research focuses on developing one-shot pruning methods that identify and remove unimportant weights or layers in a single pass, often employing novel pruning criteria based on weight magnitudes, gradients, or optimization-based approaches like bi-level optimization. These techniques offer significant potential for deploying large models on resource-constrained devices and accelerating inference times, impacting both the efficiency of AI research and the accessibility of advanced AI applications.

Papers