Pruning Quantization

Pruning quantization is a model compression technique aiming to reduce the size and computational cost of deep neural networks (DNNs) while preserving accuracy. Current research focuses on developing efficient algorithms that jointly optimize pruning (removing less important connections) and quantization (reducing the precision of weights and activations), often employing reinforcement learning or physics-inspired criteria to find optimal configurations across different DNN architectures like ResNet and MobileNet. This work is significant because it enables deploying sophisticated DNNs on resource-constrained devices, improving energy efficiency and reducing inference latency for applications ranging from mobile computing to embedded systems.

Papers