Neural Network Compression
Neural network compression aims to reduce the size and computational cost of deep learning models without significant performance loss. Current research focuses on techniques like pruning (removing less important connections), quantization (reducing the precision of weights), knowledge distillation (transferring knowledge from a larger model), and tensor decomposition (factorizing weight matrices), often applied to convolutional neural networks, recurrent neural networks, and transformers. These methods are crucial for deploying large models on resource-constrained devices like mobile phones and embedded systems, enabling broader applications in areas such as real-time image processing, autonomous driving, and medical image analysis. The development of efficient compression algorithms is driving progress in both the theoretical understanding of deep learning and its practical deployment across diverse fields.
Papers
Understanding the Effect of the Long Tail on Neural Network Compression
Harvey Dam, Vinu Joseph, Aditya Bhaskara, Ganesh Gopalakrishnan, Saurav Muralidharan, Michael Garland
End-to-End Neural Network Compression via $\frac{\ell_1}{\ell_2}$ Regularized Latency Surrogates
Anshul Nasery, Hardik Shah, Arun Sai Suggala, Prateek Jain