Paper ID: 2409.19389
Co-design of a novel CMOS highly parallel, low-power, multi-chip neural network accelerator
W Hokenmaier, R Jurasek, E Bowen, R Granger, D Odom
Why do security cameras, sensors, and siri use cloud servers instead of on-board computation? The lack of very-low-power, high-performance chips greatly limits the ability to field untethered edge devices. We present the NV-1, a new low-power ASIC AI processor that greatly accelerates parallel processing (> 10X) with dramatic reduction in energy consumption (> 100X), via many parallel combined processor-memory units, i.e., a drastically non-von-Neumann architecture, allowing very large numbers of independent processing streams without bottlenecks due to typical monolithic memory. The current initial prototype fab arises from a successful co-development effort between algorithm- and software-driven architectural design and VLSI design realities. An innovative communication protocol minimizes power usage, and data transport costs among nodes were vastly reduced by eliminating the address bus, through local target address matching. Throughout the development process, the software and architecture teams were able to innovate alongside the circuit design team's implementation effort. A digital twin of the proposed hardware was developed early on to ensure that the technical implementation met the architectural specifications, and indeed the predicted performance metrics have now been thoroughly verified in real hardware test data. The resulting device is currently being used in a fielded edge sensor application; additional proofs of principle are in progress demonstrating the proof on the ground of this new real-world extremely low-power high-performance ASIC device.
Submitted: Sep 28, 2024