Weight Pruning
Weight pruning is a model compression technique aiming to reduce the computational cost and memory footprint of deep neural networks without significant performance loss. Current research focuses on developing efficient pruning algorithms, including one-shot methods and those incorporating techniques like singular value decomposition (SVD) and Lagrangian relaxation, applied to various architectures such as transformers, convolutional neural networks (CNNs), and Mixture-of-Experts (MoE) models. This work is significant because it enables deployment of large models on resource-constrained devices and improves training efficiency, impacting fields like natural language processing, computer vision, and federated learning. Furthermore, research is exploring the theoretical underpinnings of pruning, including its effect on model uncertainty and bias.