Activation Quantization
Activation quantization aims to reduce the memory footprint and computational cost of large neural networks, particularly large language models (LLMs) and vision transformers (ViTs), by representing activations with fewer bits without significant accuracy loss. Current research focuses on mitigating the negative effects of outlier activations through techniques like rotation, outlier preservation, and channel-wise quantization, often in conjunction with weight quantization and parameter-efficient fine-tuning methods such as LoRA. These advancements are crucial for deploying increasingly complex models on resource-constrained devices and improving the efficiency of large-scale model training and inference.
Papers
May 24, 2024
May 23, 2024
April 4, 2024
February 19, 2024
December 9, 2023
November 9, 2023
November 2, 2023
October 29, 2023
October 25, 2023
October 7, 2023
August 25, 2023
July 1, 2023
June 1, 2023
May 21, 2023
March 15, 2023
December 2, 2022
November 29, 2022
October 16, 2022
September 30, 2022