Bit Weight Quantization
Bit weight quantization aims to reduce the memory footprint and computational cost of large neural networks, particularly large language models (LLMs) and diffusion models, by representing model weights using fewer bits. Current research focuses on developing novel quantization techniques, such as asymmetric floating-point quantization and activation-aware methods, to minimize accuracy loss during this compression. These advancements enable efficient deployment of massive models on resource-constrained devices, improving accessibility and accelerating inference speed for various applications, including image generation and natural language processing.
Papers
October 30, 2024
June 6, 2024
December 6, 2023
November 3, 2023
September 27, 2023
June 1, 2023
December 5, 2022