Deep Compression
Deep compression aims to reduce the size and computational cost of deep learning models without significant performance loss. Current research focuses on techniques like layer pruning and merging, quantization, and low-rank decomposition, often applied to convolutional neural networks and large language models, with some work exploring adaptive compression strategies tailored to specific data types and applications. These advancements are crucial for deploying deep learning on resource-constrained devices and improving the efficiency of large-scale model training and inference, impacting fields ranging from mobile computing to cloud services.
Papers
September 6, 2024
June 18, 2024
June 12, 2024
May 24, 2024
October 2, 2023
May 7, 2023
February 5, 2023
January 28, 2023
November 2, 2022
June 30, 2022
May 20, 2022
April 9, 2022