Activation Compression

Activation compression aims to reduce the memory footprint of deep neural networks (DNNs) during training and inference, addressing the escalating resource demands of large models like Transformers and Graph Neural Networks. Current research focuses on techniques like quantization, pruning, and wavelet transforms, applied to various architectures, exploring both lossy and lossless compression methods to minimize memory usage while preserving model accuracy. These advancements are crucial for enabling the training and deployment of increasingly complex DNNs on resource-constrained devices and accelerating the training process on large-scale datasets.

Papers