Input Quantization
Input quantization aims to reduce the computational cost and memory footprint of neural networks by representing weights and activations using lower-precision integers instead of floating-point numbers. Current research focuses on mitigating the accuracy loss inherent in this process, particularly for large language models (LLMs) and computer vision tasks, through techniques like per-channel quantization, outlier isolation, and adaptive quantization schemes. These advancements are crucial for deploying large models on resource-constrained devices and improving the efficiency of various applications, ranging from natural language processing to image analysis.
Papers
June 2, 2024
April 4, 2024
March 31, 2024
January 31, 2024
September 27, 2023