Bitwidth Quantization
Bitwidth quantization aims to reduce the computational cost and memory footprint of deep neural networks (DNNs) by representing model weights and activations using fewer bits, thereby enabling deployment on resource-constrained devices. Current research focuses on developing efficient quantization techniques for various architectures, including transformers for natural language processing and convolutional neural networks for image processing, often employing post-training quantization methods to minimize retraining overhead. These advancements are crucial for deploying large models like LLMs on edge devices and improving the efficiency of DNNs across diverse applications, ranging from speaker verification to machine translation.
Papers
December 1, 2023
October 12, 2023
February 9, 2023
October 31, 2022
August 10, 2022
July 22, 2022