Block Wise Pruning

Block-wise pruning is a model compression technique that removes entire blocks of weights from deep neural networks (DNNs) to reduce computational cost and memory footprint, primarily targeting resource-constrained environments like mobile devices. Current research focuses on improving the efficiency and accuracy of block selection algorithms, exploring various model architectures including convolutional neural networks (CNNs) and vision transformers (ViTs), and developing methods for handling unaligned blocks and integrating pruning with other optimization strategies like multi-dimensional pruning. This approach offers significant potential for accelerating DNN inference and reducing energy consumption in various applications, particularly in mobile and edge computing.

Papers