Quantization Parameter

Quantization parameters determine the precision of numerical representations in machine learning models, primarily aiming to reduce model size and computational cost without significant accuracy loss. Current research focuses on optimizing these parameters for various model architectures, including transformers and convolutional neural networks, using techniques like mixed-precision quantization, adaptive methods based on Hessian matrices or prediction differences, and bias correction for sensitive activations like softmax. This research is crucial for deploying complex models on resource-constrained devices, impacting fields like image processing, video compression, and federated learning by enabling efficient and privacy-preserving model deployment.

Papers