Activation Quantization
Activation quantization aims to reduce the memory footprint and computational cost of large neural networks, particularly large language models (LLMs) and vision transformers (ViTs), by representing activations with fewer bits without significant accuracy loss. Current research focuses on mitigating the negative effects of outlier activations through techniques like rotation, outlier preservation, and channel-wise quantization, often in conjunction with weight quantization and parameter-efficient fine-tuning methods such as LoRA. These advancements are crucial for deploying increasingly complex models on resource-constrained devices and improving the efficiency of large-scale model training and inference.
Papers
November 12, 2024
November 4, 2024
October 15, 2024
September 30, 2024
September 22, 2024
September 18, 2024
September 6, 2024
August 25, 2024
July 16, 2024
July 10, 2024
July 9, 2024
June 27, 2024
June 25, 2024
June 17, 2024
June 12, 2024
June 10, 2024
June 3, 2024
May 24, 2024
May 23, 2024
April 4, 2024