DNN Inference

DNN inference focuses on efficiently executing pre-trained deep neural networks, aiming to minimize latency, energy consumption, and memory footprint while maintaining accuracy. Current research emphasizes optimizing inference across diverse hardware platforms (e.g., mobile devices, microcontrollers, edge servers, and cloud), exploring techniques like model compression, adaptive batching, workload partitioning, and mixed-precision computation. These advancements are crucial for deploying DNNs in resource-constrained environments and improving the performance and sustainability of AI applications across various domains, from mobile computing to autonomous systems.

Papers