Edge Pruning
Edge pruning is a neural network compression technique aiming to reduce computational costs and memory usage by removing less important connections or parameters without significant performance degradation. Current research focuses on developing efficient pruning algorithms for various architectures, including convolutional neural networks (CNNs), vision transformers (ViTs), and large language models (LLMs), often incorporating techniques like knowledge distillation and optimization-based methods to improve performance after pruning. This work is significant because it enables the deployment of large, powerful models on resource-constrained devices and improves the energy efficiency of training and inference, impacting both scientific understanding of model redundancy and practical applications across diverse fields.
Papers
Finding Transformer Circuits with Edge Pruning
Adithya Bhaskar, Alexander Wettig, Dan Friedman, Danqi Chen
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging
Deyuan Liu, Zhanyue Qin, Hairu Wang, Zhao Yang, Zecheng Wang, Fangying Rong, Qingbin Liu, Yanchao Hao, Xi Chen, Cunhang Fan, Zhao Lv, Zhiying Tu, Dianhui Chu, Bo Li, Dianbo Sui