Full Precision

Full precision in neural networks refers to the use of 32-bit floating-point numbers for weights and activations, offering high accuracy but demanding significant computational resources. Current research focuses on reducing this precision, particularly through binarization (1-bit) and ternarization (3-bit) of neural networks, employing techniques like cyclic precision training, neural architecture search, and quantization-aware training to mitigate accuracy loss. These efforts aim to create more energy-efficient and deployable models for resource-constrained devices, impacting areas like embedded systems, mobile applications, and large language model optimization.

Papers