Mixed Precision

Mixed-precision computing in deep neural networks aims to improve efficiency and reduce resource consumption by using different numerical precisions (e.g., 16-bit, 8-bit, or even lower) for various parts of the network. Current research focuses on optimizing the allocation of these precisions, often employing techniques like neural architecture search and gradient-based methods, across diverse architectures including convolutional neural networks, transformers, and neural operators. This approach offers significant potential for deploying deep learning models on resource-constrained devices like microcontrollers and embedded systems, while also accelerating training and inference on more powerful hardware.

Papers