Efficient Dequantization
Efficient dequantization focuses on minimizing the performance loss incurred when representing continuous data using discrete values, a common technique for memory and computational efficiency in machine learning. Current research emphasizes developing differentiable dequantization methods integrated into model training, often employing techniques like tensor decomposition, and optimized kernels for specific hardware architectures (e.g., GPUs) to accelerate inference. These advancements are crucial for deploying large language models and other computationally intensive applications, enabling faster inference and reduced memory footprint without significant accuracy sacrifices.
Papers
January 6, 2025
December 19, 2024
December 2, 2024
November 25, 2024
July 22, 2024
June 16, 2024
June 11, 2024
May 21, 2024
March 19, 2024
January 15, 2024
November 28, 2023
September 20, 2023
April 11, 2023
June 20, 2022
June 2, 2022
May 16, 2022
March 14, 2022
February 23, 2022
December 1, 2021