Quantization Technique
Quantization techniques aim to reduce the memory footprint and computational cost of deep learning models by representing their weights and activations using fewer bits, thereby accelerating inference and enabling deployment on resource-constrained devices. Current research focuses on developing novel quantization algorithms for various architectures, including large language models (LLMs), diffusion models, and vision transformers, often employing strategies like post-training quantization (PTQ) and quantization-aware training (QAT) to minimize accuracy loss. This area is crucial for advancing the practical applicability of increasingly complex deep learning models across diverse fields, from natural language processing and image generation to speech recognition and computer vision.
Papers
Hadamard Domain Training with Integers for Class Incremental Quantized Learning
Martin Schiemer, Clemens JS Schaefer, Jayden Parker Vap, Mark James Horeni, Yu Emma Wang, Juan Ye, Siddharth Joshi
EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models
Yefei He, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang