Soft Filter Pruning
Soft filter pruning is a model compression technique that aims to reduce the computational cost of deep neural networks (DNNs) by selectively removing less important filters or tokens, while preserving accuracy. Current research focuses on improving the effectiveness of soft pruning methods, particularly addressing inconsistencies between training and inference phases, developing theoretically grounded pruning strategies (e.g., using iterative shrinkage-thresholding algorithms), and adapting these techniques to various architectures including Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). These advancements are significant for deploying DNNs on resource-constrained devices, enabling faster and more efficient inference in applications ranging from image classification to mobile computing.