Full Precision
Full precision in neural networks refers to the use of 32-bit floating-point numbers for weights and activations, offering high accuracy but demanding significant computational resources. Current research focuses on reducing this precision, particularly through binarization (1-bit) and ternarization (3-bit) of neural networks, employing techniques like cyclic precision training, neural architecture search, and quantization-aware training to mitigate accuracy loss. These efforts aim to create more energy-efficient and deployable models for resource-constrained devices, impacting areas like embedded systems, mobile applications, and large language model optimization.
Papers
December 20, 2024
November 23, 2024
November 3, 2024
September 28, 2024
August 28, 2024
May 27, 2024
April 16, 2024
February 5, 2024
November 23, 2023
June 23, 2023
June 2, 2023
February 2, 2023
January 12, 2023
November 4, 2022
October 6, 2022
August 4, 2022
August 3, 2022
July 11, 2022
June 24, 2022