Layer Wise Quantization
Layer-wise quantization aims to improve the efficiency of deep neural networks by reducing the precision of individual layers' weights and activations, thereby decreasing computational cost and memory footprint. Current research focuses on developing effective post-training quantization methods for various architectures, including Vision Transformers and Large Language Models, often employing techniques like accumulator-aware quantization and learned quantization schemes to mitigate accuracy loss. These advancements are significant for deploying large models on resource-constrained devices and accelerating inference, impacting both the efficiency of machine learning research and its practical applications.
Papers
September 25, 2024
August 22, 2024
August 6, 2024
June 25, 2024
January 19, 2024
November 16, 2023
October 31, 2023
September 5, 2023
August 25, 2023
July 11, 2023
June 29, 2023
June 12, 2023
April 4, 2023
November 27, 2022
October 13, 2022
April 21, 2022
March 15, 2022
March 8, 2022