Sharpness Aware Quantization

Sharpness-aware quantization (SAQ) aims to improve the accuracy of quantized deep neural networks (DNNs), a crucial technique for deploying models on resource-constrained hardware. Current research focuses on mitigating the negative impact of quantization on model robustness, particularly addressing issues like sensitivity to hardware faults and instability during training, often employing techniques like Sharpness-Aware Minimization (SAM) to smooth the loss landscape. This work is significant because it tackles the trade-off between model compression and accuracy, leading to more efficient and reliable DNNs across various architectures, including convolutional neural networks, transformers, and even spiking neural networks, with applications in image compression and natural language processing.

Papers