Quantization Error
Quantization error arises from representing continuous-valued data (e.g., neural network weights and activations) using a limited number of bits, impacting model accuracy and efficiency. Current research focuses on mitigating this error in large language models (LLMs) and vision transformers (ViTs), employing techniques like post-training quantization, quantization-aware training, and the development of novel quantization algorithms (e.g., those incorporating learned rotations or adaptive clipping). Reducing quantization error is crucial for deploying large models on resource-constrained devices, improving energy efficiency, and enabling wider accessibility of advanced AI applications.
Papers
December 11, 2022
December 6, 2022
November 30, 2022
November 8, 2022
October 17, 2022
October 7, 2022
August 25, 2022
July 31, 2022
June 15, 2022
March 31, 2022
March 11, 2022
February 10, 2022
January 17, 2022
December 24, 2021