Network Quantization

Network quantization aims to reduce the computational cost and memory footprint of deep neural networks by representing weights and activations using fewer bits, thereby enabling faster and more efficient inference on resource-constrained devices. Current research focuses on improving quantization techniques for various architectures, including convolutional neural networks (CNNs) and vision transformers (ViTs), exploring methods like quantization-aware training, post-training quantization, and data-free quantization to minimize accuracy loss during the compression process. These advancements are significant for deploying deep learning models on edge devices and mobile platforms, broadening the accessibility and applicability of AI in various domains.

Papers