Matrix Compression

Matrix compression aims to reduce the storage and computational demands of large matrices, crucial for handling massive datasets in various fields. Current research focuses on developing efficient algorithms, such as hierarchical matrices, quantization techniques (including low-rank and low-precision factorizations), and novel compression schemes tailored for specific applications like large language models (LLMs) and key-value caching. These advancements are significantly impacting fields like machine learning, computer vision, and scientific computing by enabling the processing of previously intractable datasets and accelerating inference speeds for resource-intensive applications. The development of near-lossless compression methods is a key area of ongoing investigation.

Papers

October 24, 2024

Learning Structured Compressed Sensing with Automatic Resource Allocation
Han Wang, Eduardo Pérez, Iris A. M. Huijben, Hans van Gorp, Ruud van Sloun, Florian Römer
Structured Sparsity Capital Allocation Compressive Sensing Multidimensional Datasets Matrix Compression

September 11, 2024

Adaptive Error-Bounded Hierarchical Matrices for Efficient Neural Network Compression
John Mango, Ronald Katende
Physic Informed Neural Network Singular Value Decomposition Neural Network Compression Eigen Portfolio Matrix Compression

July 4, 2024

QET: Enhancing Quantized LLM Parameters and KV cache Compression through Element Substitution and Residual Clustering
Yanshu Wang, Wang Li, Zhaoqian Yao, Tong Yang
Text Generation Natural Language Processing Task Element Level Information Vector Database Matrix Compression

March 8, 2024

GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
Hao Kang, Qingru Zhang, Souvik Kundu, Geonhwa Jeong, Zaoxing Liu, Tushar Krishna, Tuo Zhao
Large Language Model Medical LLM Lossless Compression Key Value Cache Compression Efficient Generative Gear Application Matrix Compression

October 17, 2023

Matrix Compression via Randomized Low Rank and Low Precision Factorization
Rajarshi Saha, Varun Srivastava, Mert Pilanci
Image Compression Low Rank Low Rank Decomposition Compression Method External Keyword Matrix Matrix Compression Precision Factorization

August 8, 2023

Iterative Sketching for Secure Coded Regression
Neophytos Charalambides, Hessam Mahdavifar, Mert Pilanci, Alfred O. Hero
Machine Learning Novel Regression Linear Regression Secure Approach Iterative Stochastic Matrix Compression

July 17, 2023

Distributed bundle adjustment with block-based sparse matrix compression for super large scale datasets
Maoteng Zheng, Nengcheng Chen, Junfeng Zhu, Xiaoru Zeng, Huanbin Qiu, Yuyao Jiang, Xingyue Lu, Hao Qu
Linear Compression Large Scale Datasets Bundle Adjustment Block Sparsity Matrix Compression

Matrix Compression

Papers

Learning Structured Compressed Sensing with Automatic Resource Allocation

Adaptive Error-Bounded Hierarchical Matrices for Efficient Neural Network Compression

QET: Enhancing Quantized LLM Parameters and KV cache Compression through Element Substitution and Residual Clustering

GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM

Matrix Compression via Randomized Low Rank and Low Precision Factorization

Iterative Sketching for Secure Coded Regression

Distributed bundle adjustment with block-based sparse matrix compression for super large scale datasets