Quantized Transformer
Quantized transformers aim to reduce the computational cost and memory footprint of transformer models by representing their weights and activations using fewer bits, thereby enabling deployment on resource-constrained devices. Current research focuses on optimizing quantization techniques, including integer and floating-point quantization, exploring different bit-widths (e.g., 4-bit, 6-bit, 8-bit), and developing novel architectures and algorithms to mitigate information loss during quantization. This work is significant because it addresses the scalability challenges of large transformer models, paving the way for wider adoption in applications like embedded systems, edge AI, and mobile devices.
Papers
July 6, 2024
March 31, 2024
October 25, 2023
July 7, 2023
June 22, 2023
May 2, 2023
April 8, 2023
September 27, 2022
September 25, 2022
May 10, 2022