DNN Inference
DNN inference focuses on efficiently executing pre-trained deep neural networks, aiming to minimize latency, energy consumption, and memory footprint while maintaining accuracy. Current research emphasizes optimizing inference across diverse hardware platforms (e.g., mobile devices, microcontrollers, edge servers, and cloud), exploring techniques like model compression, adaptive batching, workload partitioning, and mixed-precision computation. These advancements are crucial for deploying DNNs in resource-constrained environments and improving the performance and sustainability of AI applications across various domains, from mobile computing to autonomous systems.
Papers
Energy Optimization of Multi-task DNN Inference in MEC-assisted XR Devices: A Lyapunov-Guided Reinforcement Learning Approach
Yanzan Sun, Jiacheng Qiu, Guangjin Pan, Shugong Xu, Shunqing Zhang, Xiaoyun Wang, Shuangfeng Han
PTEENet: Post-Trained Early-Exit Neural Networks Augmentation for Inference Cost Optimization
Assaf Lahiany, Yehudit Aperstein