Low Bit
Low-bit quantization aims to reduce the computational and memory demands of deep neural networks by representing model parameters and activations using fewer bits, thereby improving efficiency without significant accuracy loss. Current research focuses on developing novel quantization techniques for various architectures, including transformers, convolutional neural networks, and large language models, often employing methods like data-free quantization, layer-wise quantization, and adaptive precision strategies. This area is crucial for deploying large models on resource-constrained devices and accelerating inference, impacting both the efficiency of machine learning research and the practical applications of AI in various domains.
Papers
December 26, 2024
December 19, 2024
November 15, 2024
November 12, 2024
October 18, 2024
October 15, 2024
August 13, 2024
July 29, 2024
July 17, 2024
June 25, 2024
May 27, 2024
May 22, 2024
March 12, 2024
March 6, 2024
December 20, 2023
December 6, 2023
November 30, 2023
November 12, 2023
September 19, 2023