Non Uniform Quantization
Non-uniform quantization is a technique for compressing neural networks by representing weights and activations with fewer bits, aiming to reduce storage and computational costs without significant accuracy loss. Current research focuses on developing efficient algorithms for non-uniform quantization, particularly within large language models (LLMs) and convolutional neural networks (CNNs), often incorporating techniques like outlier-aware training and optimization-based approaches to improve accuracy at low bitwidths. This work is crucial for deploying large models on resource-constrained devices and accelerating inference, impacting both the efficiency of AI systems and their accessibility across various platforms.
Papers
October 16, 2024
June 17, 2024
February 25, 2024
February 11, 2024
February 2, 2024
September 5, 2023
August 10, 2023
January 24, 2023
December 19, 2022
November 27, 2022
September 1, 2022
July 31, 2022
March 9, 2022