Input Quantization

Input quantization aims to reduce the computational cost and memory footprint of neural networks by representing weights and activations using lower-precision integers instead of floating-point numbers. Current research focuses on mitigating the accuracy loss inherent in this process, particularly for large language models (LLMs) and computer vision tasks, through techniques like per-channel quantization, outlier isolation, and adaptive quantization schemes. These advancements are crucial for deploying large models on resource-constrained devices and improving the efficiency of various applications, ranging from natural language processing to image analysis.

Papers