Dynamic Quantization
Dynamic quantization aims to improve the efficiency of deep learning models by representing their weights and activations using fewer bits, thereby reducing computational cost and memory footprint. Current research focuses on optimizing quantization strategies for various model architectures, including large language models (LLMs), diffusion models, and vision transformers, often employing techniques like per-tensor or per-token quantization, dynamic bit allocation, and weight dilation to mitigate accuracy loss. These advancements are significant for deploying large models on resource-constrained devices and accelerating inference speed in diverse applications such as natural language processing, image generation, and video processing.
Papers
October 8, 2024
October 7, 2024
September 22, 2024
July 16, 2024
June 11, 2024
May 28, 2024
May 10, 2024
March 7, 2024
March 5, 2024
February 3, 2024
December 3, 2023
November 27, 2023
October 26, 2023
July 4, 2023
June 20, 2023
June 4, 2023
June 1, 2023
May 19, 2023