Adaptive Quantization
Adaptive quantization optimizes the precision of numerical representations in machine learning models, aiming to reduce computational costs and memory footprint without significant performance loss. Current research focuses on developing dynamic quantization schemes that adjust bit-widths based on data characteristics or model layer sensitivity, often employing techniques like Gumbel-softmax, k-means clustering, and mixed-precision quantization within architectures such as transformers and convolutional neural networks. This work is significant for deploying large models on resource-constrained devices and improving the efficiency of various applications, including image processing, natural language processing, and speaker verification.
Papers
July 6, 2024
June 8, 2024
April 4, 2024
March 19, 2024
March 7, 2024
March 2, 2024
February 27, 2024
February 5, 2024
January 2, 2024
October 4, 2023
August 1, 2023
July 20, 2023
June 28, 2023
November 5, 2022
September 19, 2022
July 13, 2022
March 7, 2022