Uniform Quantization
Uniform quantization aims to reduce the computational cost and memory footprint of deep neural networks by representing weights and activations using low-bit integer representations. Current research focuses on mitigating the accuracy loss inherent in this process, particularly for complex architectures like Vision Transformers (ViTs) and large language models (LLMs), through techniques such as adaptive quantization schemes, per-layer or per-head bit-width optimization, and the incorporation of floating-point formats. These advancements are crucial for deploying large models on resource-constrained devices and accelerating inference speeds in various applications, from computer vision to natural language processing.
Papers
October 27, 2024
October 17, 2024
July 17, 2024
July 3, 2024
May 22, 2024
May 1, 2024
April 19, 2024
April 15, 2024
April 10, 2024
March 26, 2024
August 10, 2023
July 19, 2023
June 28, 2023
May 16, 2023
February 22, 2023
January 20, 2023
July 13, 2022
May 24, 2022
February 18, 2022