DNN Inference
DNN inference focuses on efficiently executing pre-trained deep neural networks, aiming to minimize latency, energy consumption, and memory footprint while maintaining accuracy. Current research emphasizes optimizing inference across diverse hardware platforms (e.g., mobile devices, microcontrollers, edge servers, and cloud), exploring techniques like model compression, adaptive batching, workload partitioning, and mixed-precision computation. These advancements are crucial for deploying DNNs in resource-constrained environments and improving the performance and sustainability of AI applications across various domains, from mobile computing to autonomous systems.
Papers
November 6, 2024
October 2, 2024
May 9, 2024
May 1, 2024
March 23, 2024
March 13, 2024
March 2, 2024
January 30, 2024
January 23, 2024
December 16, 2023
October 16, 2023
September 5, 2023
August 14, 2023
August 10, 2023
June 29, 2023
June 28, 2023
June 22, 2023
June 8, 2023