DNN Inference
DNN inference focuses on efficiently executing pre-trained deep neural networks, aiming to minimize latency, energy consumption, and memory footprint while maintaining accuracy. Current research emphasizes optimizing inference across diverse hardware platforms (e.g., mobile devices, microcontrollers, edge servers, and cloud), exploring techniques like model compression, adaptive batching, workload partitioning, and mixed-precision computation. These advancements are crucial for deploying DNNs in resource-constrained environments and improving the performance and sustainability of AI applications across various domains, from mobile computing to autonomous systems.
Papers
May 1, 2023
March 31, 2023
January 15, 2023
October 11, 2022
September 12, 2022
July 7, 2022
June 17, 2022