Quantization Model
Quantization models aim to reduce the computational cost and memory footprint of large language and vision models without significant performance loss. Current research focuses on developing novel quantization techniques, such as distribution-friendly quantizers and methods for handling outliers, to improve accuracy, particularly in low-bit settings, across various architectures including Vision Transformers and LLMs. This work is crucial for deploying these powerful models on resource-constrained devices, enabling wider accessibility and addressing environmental concerns related to energy consumption.
Papers
August 25, 2024
August 6, 2024
July 22, 2024
April 22, 2024
March 14, 2024
August 23, 2023
August 4, 2023
July 16, 2023
May 10, 2023
April 8, 2023
March 23, 2023
December 31, 2022
July 20, 2022
April 8, 2022