Quantization Error
Quantization error arises from representing continuous-valued data (e.g., neural network weights and activations) using a limited number of bits, impacting model accuracy and efficiency. Current research focuses on mitigating this error in large language models (LLMs) and vision transformers (ViTs), employing techniques like post-training quantization, quantization-aware training, and the development of novel quantization algorithms (e.g., those incorporating learned rotations or adaptive clipping). Reducing quantization error is crucial for deploying large models on resource-constrained devices, improving energy efficiency, and enabling wider accessibility of advanced AI applications.
Papers
April 12, 2024
April 3, 2024
April 2, 2024
April 1, 2024
March 28, 2024
March 11, 2024
February 4, 2024
January 19, 2024
December 17, 2023
November 28, 2023
November 2, 2023
August 24, 2023
August 13, 2023
August 1, 2023
July 11, 2023
June 30, 2023
June 5, 2023
May 30, 2023