Floating Point Format

Floating-point formats are numerical representations crucial for efficient deep learning, particularly in large language models (LLMs) and convolutional neural networks (CNNs). Current research focuses on developing novel low-bit (e.g., 4-bit, 8-bit) floating-point formats, including variations like block floating point (BFP) and tapered precision designs, to optimize memory usage, computational speed, and energy efficiency without sacrificing accuracy. This work is driven by the need to deploy increasingly large models on resource-constrained hardware, and findings demonstrate that carefully designed low-precision floating-point formats can often outperform their integer counterparts in various applications.

Papers