Quantization Aware Knowledge Distillation

Quantization-aware knowledge distillation (QKD) aims to create efficient, low-bit deep learning models by leveraging knowledge transfer from high-precision models during quantization. Current research focuses on improving the accuracy of quantized models, particularly for transformers and large language models, through techniques like self-supervised learning, novel quantization schemes (e.g., hybrid quantization), and optimized knowledge distillation strategies. This work is significant because it enables the deployment of complex deep learning models on resource-constrained devices, impacting applications ranging from image processing and natural language processing to autonomous driving and remote sensing.

Papers