Vision Transformer Quantization

Vision transformer (ViT) quantization aims to reduce the computational cost and memory footprint of these powerful but resource-intensive models by representing their weights and activations using lower-precision numbers. Current research focuses on developing effective quantization techniques, particularly post-training quantization methods, for various ViT architectures like ViT, DeiT, and Swin Transformer, often employing strategies such as mixed-precision quantization and novel quantization schemes to mitigate performance degradation. These efforts are significant because they enable the deployment of ViTs on resource-constrained devices, expanding their applicability in areas like mobile and embedded vision systems.

Papers