Zero Shot Quantization

Zero-shot quantization aims to compress and accelerate deep neural networks without using the original training data, addressing privacy and resource constraints. Current research focuses on generating high-quality synthetic data to calibrate quantized models, employing techniques like leveraging batch normalization statistics, diffusion models, and adversarial training to improve the realism and diversity of this synthetic data. This approach is particularly relevant for large language models (LLMs) and computer vision models, impacting both model deployment on resource-limited devices and the efficient use of large pre-trained models. The ultimate goal is to achieve accuracy comparable to models quantized with real data, while significantly reducing computational costs and memory footprint.

Papers