Integer Quantization
Integer quantization aims to reduce the memory and computational requirements of deep learning models by representing their weights and activations using a limited number of integer bits, thereby improving efficiency for deployment on resource-constrained devices. Current research focuses on developing novel quantization algorithms, particularly for large language models (LLMs) and vision transformers (ViTs), often employing techniques like post-training quantization and mixed-precision approaches to minimize accuracy loss. These advancements are crucial for enabling the deployment of powerful deep learning models on mobile and embedded systems, expanding the accessibility and applicability of AI across various domains.
Papers
September 26, 2024
August 25, 2024
August 13, 2024
July 6, 2024
May 28, 2024
March 11, 2024
February 23, 2024
February 19, 2024
February 8, 2024
February 2, 2024
October 25, 2023
May 16, 2023
March 23, 2023
July 4, 2022
May 31, 2022