Quantization Model

Quantization models aim to reduce the computational cost and memory footprint of large language and vision models without significant performance loss. Current research focuses on developing novel quantization techniques, such as distribution-friendly quantizers and methods for handling outliers, to improve accuracy, particularly in low-bit settings, across various architectures including Vision Transformers and LLMs. This work is crucial for deploying these powerful models on resource-constrained devices, enabling wider accessibility and addressing environmental concerns related to energy consumption.

Papers