Quantization Loss
Quantization loss arises from representing the high-precision weights and activations of large language models (LLMs) and other deep neural networks with lower bit-widths, impacting model accuracy and efficiency. Current research focuses on mitigating this loss through techniques like developing loss-aware quantization grids, employing quantization-aware training, and exploring optimal quantization strategies tailored to different layers or model architectures (e.g., Vision Transformers, LLMs). Reducing quantization loss is crucial for deploying these computationally intensive models on resource-constrained devices, improving their accessibility and applicability across various domains.
Papers
October 30, 2024
July 14, 2024
June 16, 2024
May 30, 2024
May 1, 2024
April 10, 2024
January 30, 2024
December 9, 2023
August 9, 2023
May 18, 2023
April 25, 2023
May 31, 2022
January 17, 2022
January 14, 2022