Quantized Transformer

Quantized transformers aim to reduce the computational cost and memory footprint of transformer models by representing their weights and activations using fewer bits, thereby enabling deployment on resource-constrained devices. Current research focuses on optimizing quantization techniques, including integer and floating-point quantization, exploring different bit-widths (e.g., 4-bit, 6-bit, 8-bit), and developing novel architectures and algorithms to mitigate information loss during quantization. This work is significant because it addresses the scalability challenges of large transformer models, paving the way for wider adoption in applications like embedded systems, edge AI, and mobile devices.

Papers