Precision Transformer
Precision Transformer research focuses on optimizing the efficiency and performance of transformer-based models, primarily large language models (LLMs), by reducing their computational and memory requirements. Current efforts concentrate on techniques like post-training quantization (PTQ), including methods employing learned rotations to minimize quantization errors, and the exploration of extremely low-precision (e.g., binary, ternary, 1-bit) architectures. These advancements aim to improve the scalability and accessibility of LLMs by significantly reducing energy consumption and memory footprint while maintaining or even improving accuracy on various downstream tasks.
Papers
May 27, 2024
May 26, 2024
March 14, 2024
October 17, 2023
August 9, 2023
August 6, 2023
February 23, 2023
October 6, 2022
July 2, 2022
March 23, 2022