Paper ID: 2404.08131

Frame Quantization of Neural Networks

Wojciech Czaja, Sanghoon Na

We present a post-training quantization algorithm with error estimates relying on ideas originating from frame theory. Specifically, we use first-order Sigma-Delta ($\Sigma\Delta$) quantization for finite unit-norm tight frames to quantize weight matrices and biases in a neural network. In our scenario, we derive an error bound between the original neural network and the quantized neural network in terms of step size and the number of frame elements. We also demonstrate how to leverage the redundancy of frames to achieve a quantized neural network with higher accuracy.

Submitted: Apr 11, 2024

Topics

Neural Network
Post Training Quantization
Vector Quantization
Multiplier Free Quantization

Links

arXiv PDF