Linear Compression
Linear compression techniques aim to reduce the size of data or models while minimizing information loss, crucial for efficient storage, transmission, and processing, especially with the rise of large language models and high-resolution data. Current research focuses on adapting and developing compression methods for various model architectures, including transformers and neural radiance fields, employing techniques like low-rank approximation, quantization, pruning, and hierarchical clustering. These advancements are significant for improving the efficiency and scalability of machine learning applications across diverse domains, from natural language processing and image compression to federated learning and earth observation.
Papers
Understanding the Effect of the Long Tail on Neural Network Compression
Harvey Dam, Vinu Joseph, Aditya Bhaskara, Ganesh Gopalakrishnan, Saurav Muralidharan, Michael Garland
End-to-End Neural Network Compression via $\frac{\ell_1}{\ell_2}$ Regularized Latency Surrogates
Anshul Nasery, Hardik Shah, Arun Sai Suggala, Prateek Jain