Deep Compression

Deep compression aims to reduce the size and computational cost of deep learning models without significant performance loss. Current research focuses on techniques like layer pruning and merging, quantization, and low-rank decomposition, often applied to convolutional neural networks and large language models, with some work exploring adaptive compression strategies tailored to specific data types and applications. These advancements are crucial for deploying deep learning on resource-constrained devices and improving the efficiency of large-scale model training and inference, impacting fields ranging from mobile computing to cloud services.

Papers