Model Pruning
Model pruning aims to reduce the computational cost and memory footprint of large neural networks by removing less important parameters, while preserving or even improving performance. Current research focuses on developing efficient one-shot pruning methods, particularly for large language models (LLMs) and vision transformers (ViTs), often incorporating techniques like gradient-based importance scoring, block-aware optimization, and prompt-based approaches. These advancements are crucial for deploying sophisticated AI models on resource-constrained devices and improving the efficiency of training and inference, impacting both scientific understanding of model architectures and practical applications across various domains.
Papers
November 15, 2024
November 10, 2024
October 31, 2024
October 27, 2024
October 22, 2024
October 21, 2024
August 27, 2024
August 20, 2024
August 16, 2024
August 3, 2024
July 24, 2024
July 23, 2024
July 3, 2024
June 24, 2024
June 17, 2024
June 11, 2024
May 31, 2024
May 28, 2024
May 26, 2024