Network Compression

Network compression aims to reduce the size and computational cost of deep neural networks (DNNs) without significant performance loss. Current research focuses on techniques like pruning (removing less important connections), quantization (reducing the precision of weights), and low-rank approximations, often applied during training or post-training, and applied to various architectures including CNNs, GANs, and transformers. These advancements are crucial for deploying large-scale DNNs on resource-constrained devices and improving the efficiency of training and inference, impacting both scientific understanding of DNNs and their practical applications across diverse fields.

Papers