Quantization Error
Quantization error arises from representing continuous-valued data (e.g., neural network weights and activations) using a limited number of bits, impacting model accuracy and efficiency. Current research focuses on mitigating this error in large language models (LLMs) and vision transformers (ViTs), employing techniques like post-training quantization, quantization-aware training, and the development of novel quantization algorithms (e.g., those incorporating learned rotations or adaptive clipping). Reducing quantization error is crucial for deploying large models on resource-constrained devices, improving energy efficiency, and enabling wider accessibility of advanced AI applications.
Papers
December 28, 2024
December 18, 2024
December 12, 2024
December 10, 2024
December 8, 2024
December 2, 2024
November 19, 2024
November 12, 2024
November 6, 2024
October 15, 2024
October 9, 2024
October 8, 2024
September 10, 2024
August 25, 2024
July 22, 2024
July 17, 2024
July 9, 2024
July 4, 2024
June 6, 2024