Multiplier Free Quantization

Multiplier-free quantization aims to reduce the computational cost and memory footprint of deep learning models, particularly large language models (LLMs) and vision transformers, by representing model weights and activations using lower bit-widths without significant accuracy loss. Current research focuses on developing novel quantization algorithms, including post-training quantization (PTQ) and quantization-aware training (QAT) methods, often incorporating techniques like activation smoothing and outlier management to mitigate performance degradation at low bit-widths. This research is crucial for deploying large, computationally expensive models on resource-constrained devices, such as mobile phones and edge computing platforms, thereby broadening the accessibility and applicability of advanced AI systems.

Papers