Lossless Performance

Lossless performance in data compression aims to reduce data size without any information loss, enabling perfect reconstruction of the original data. Current research focuses on achieving this for large language models (LLMs) and images, employing techniques like weight-momentum joint shrinking for LLMs, mixed-precision quantization for expert switching frameworks, and novel depth-wise compression for key-value caches. These advancements are crucial for deploying large models efficiently, reducing storage needs, and improving the speed and scalability of applications ranging from natural language processing to image analysis and federated learning.

Papers