Heterogeneous Quantization
Heterogeneous quantization optimizes deep neural network efficiency by assigning different bit-precisions to various model parameters and operations, aiming to minimize accuracy loss while maximizing speed and memory savings. Current research focuses on developing automated methods for determining optimal bit-width assignments across diverse model architectures, including transformers and convolutional neural networks, and addressing quantization-induced inefficiencies in specific layers like batch normalization. This approach is crucial for deploying deep learning models on resource-constrained edge devices, enabling broader applications in areas like embedded systems and mobile computing.
Papers
October 29, 2024
October 4, 2024
September 27, 2024
March 29, 2024
November 10, 2023
September 5, 2023
February 24, 2023
December 3, 2022
October 7, 2022
June 15, 2022