Quantization Aware Training

Quantization-aware training (QAT) aims to improve the efficiency of deep learning models by training them to operate directly with low-precision numerical representations (e.g., 4-bit or 8-bit integers), minimizing accuracy loss compared to full-precision models. Current research focuses on applying QAT to large language models (LLMs) and other resource-intensive architectures like transformers and diffusion models, exploring techniques like mixed-precision quantization, accumulator-aware quantization, and the use of novel quantization functions and regularization methods to enhance accuracy and stability. This work is significant because it enables the deployment of powerful deep learning models on resource-constrained devices, such as mobile phones and embedded systems, while also reducing energy consumption and computational costs.

Papers

June 13, 2024

Q-S5: Towards Quantized State Space Models
Steven Abreu, Jens E. Pedersen, Kade M. Heckel, Alessandro Pierro
State Space Model Quantization Aware Training Sequence Model Ensemble Forecast Sequential Deep

June 10, 2024

Low-Rank Quantization-Aware Training for LLMs
Yelysei Bondarenko, Riccardo Del Chiaro, Markus Nagel
Large Language Model Quantization Aware Training Activation Quantization

May 23, 2024

Embedding Compression for Efficient Re-Identification
Luke McDermott
Person Re Identification Linear Compression Quantization Aware Training LOw Rank Target Re Identification Real World Person Re Identification

May 22, 2024

Two Heads are Better Than One: Neural Networks Quantization with 2D Hilbert Curve-based Output Representation
Mykhailo Uss, Ruslan Yermolenko, Olena Kolodiazhna, Oleksii Shashko, Ivan Safonov, Volodymyr Savin, Yoonjae Yeo, Seowon Ji, Jaeyun Jeong
Structured Output Post Training Quantization Human Head Quantization Aware Training Quantization Error Quantization Performance 2 Dimensional Hilbert

May 8, 2024

Custom Gradient Estimators are Straight-Through Estimators in Disguise
Matt Schoenbauer, Daniele Moro, Lukasz Lew, Andrew Howard
Quantization Aware Training Gradient Estimator Digital STEALTH Metric Small Sized Convolutional Neural Network Straight Through Estimator Quantization Function

May 1, 2024

Gradient-based Automatic Mixed Precision Quantization for Neural Networks On-Chip
Chang Sun, Thea K. Årrestad, Vladimir Loncar, Jennifer Ngadiuba, Maria Spiropulu
Multiplier Free Quantization Quantization Aware Training Mixed Precision Quantization Uniform Quantization Chip Learning Quantization Granularity

April 30, 2024

Transition Rate Scheduling for Quantization-Aware Training
Junghyup Lee, Jeimin Jeon, Dohyung Kim, Bumsub Ham
Learning Rate Quantization Aware Training Weight Freezing Weight Only Quantization Quantization Step Variable Rate

April 25, 2024

How to Parameterize Asymmetric Quantization Ranges for Quantization-Aware Training
Jaeseong You, Minseop Park, Kyunggeun Lee, Seokjun An, Chirag Patel, Markus Nage
Large Language Model Quantization Aware Training Hyper Parameter Quantization Learning

April 22, 2024

AdaQAT: Adaptive Bit-Width Quantization-Aware Training
Cédric Gernigon, Silviu-Ioan Filip, Olivier Sentieys, Clément Coggiola, Mickael Bruno
Deep Neural Network Large Scale Quantization Aware Training Mixed Precision Quantization

April 17, 2024

QGen: On the Ability to Generalize in Quantization Aware Training
MohammadHossein AskariHemmat, Ahmadreza Jeddi, Reyhane Askari Hemmat, Ivan Lazarevich, Alexander Hoffman, Sudhakar Sah, Ehsan Saboori, Yvon Savaria, Jean-Pierre David
Quantization Operator Critique Ability ImageNet Dataset Quantization Aware Training Generalization Property Quantization Noise Quantization Function

April 15, 2024

SQUAT: Stateful Quantization-Aware Training in Recurrent Spiking Neural Networks
Sreyes Venkatesh, Razvan Marinescu, Jason K. Eshraghian
Quantization Aware Training Weight Quantization Quantization Level Uniform Quantization Quad Attention Sharpness Aware Quantization

April 8, 2024

April 4, 2024

Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
Aniruddha Nrusimha, Mayank Mishra, Naigang Wang, Dan Alistarh, Rameswar Panda, Yoon Kim
Large Language Model Global Impact Post Training Quantization Quantization Aware Training Activation Quantization Accurate Quantization Task Specific Channel Input Quantization

March 22, 2024

Magic for the Age of Quantized DNNs
Yoshihide Sawada, Ryuji Saiin, Kazuma Suetake
Speech Based Age Quantization Aware Training LangId Magic Spell Batch Normalization Layer Quantized Neural Network Product Quantization New Normalization Technique Quantised Neural Network

March 17, 2024

Self-Supervised Quantization-Aware Knowledge Distillation
Kaiqi Zhao, Ming Zhao
Knowledge Distillation Quantization Operator Quantization Aware Training Quantization Aware Knowledge Distillation

March 8, 2024

Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference
Akshat Ramachandran, Zishen Wan, Geonhwa Jeong, John Gustafson, Tushar Krishna
Quantization Aware Training Linear Code Algorithm Hardware Co Design

February 21, 2024

In-Distribution Consistency Regularization Improves the Generalization of Quantization-Aware Training
Junbiao Pang, Tianyang Cai, Baochang Zhang, Jiaqi Wu
Generalization Performance Consistency Regularization Quantization Aware Training

February 16, 2024