Quantization Challenge
Quantization aims to reduce the computational cost and memory footprint of large language models (LLMs) and other deep learning models like diffusion transformers and vision transformers by representing their weights and activations using fewer bits. Current research focuses on developing post-training quantization (PTQ) methods that minimize accuracy loss during this process, often employing techniques like singular value decomposition and noise correction to mitigate quantization errors. These advancements are crucial for deploying large models on resource-constrained devices, enabling wider accessibility and reducing the environmental impact of AI.
Papers
July 22, 2024
June 24, 2024
May 25, 2024
March 28, 2024
March 15, 2024
September 8, 2023
March 25, 2023
January 20, 2023