Uniform Quantization

Uniform quantization aims to reduce the computational cost and memory footprint of deep neural networks by representing weights and activations using low-bit integer representations. Current research focuses on mitigating the accuracy loss inherent in this process, particularly for complex architectures like Vision Transformers (ViTs) and large language models (LLMs), through techniques such as adaptive quantization schemes, per-layer or per-head bit-width optimization, and the incorporation of floating-point formats. These advancements are crucial for deploying large models on resource-constrained devices and accelerating inference speeds in various applications, from computer vision to natural language processing.

Papers