Adaptive Quantization
Adaptive quantization optimizes the precision of numerical representations in machine learning models, aiming to reduce computational costs and memory footprint without significant performance loss. Current research focuses on developing dynamic quantization schemes that adjust bit-widths based on data characteristics or model layer sensitivity, often employing techniques like Gumbel-softmax, k-means clustering, and mixed-precision quantization within architectures such as transformers and convolutional neural networks. This work is significant for deploying large models on resource-constrained devices and improving the efficiency of various applications, including image processing, natural language processing, and speaker verification.
Papers
December 30, 2024
December 24, 2024
December 21, 2024
December 14, 2024
November 17, 2024
July 6, 2024
June 8, 2024
April 4, 2024
March 19, 2024
March 7, 2024
March 2, 2024
February 27, 2024
February 5, 2024
January 2, 2024
October 4, 2023
August 1, 2023
July 20, 2023
June 28, 2023
November 5, 2022