Energy Efficient Inference
Energy-efficient inference focuses on minimizing the computational resources and power consumption required to run deep learning models, particularly at the edge where resources are limited. Current research emphasizes techniques like model compression (e.g., pruning, quantization, knowledge distillation), efficient algorithms (e.g., spiking neural networks, dynamic decision trees), and hardware-aware optimization (e.g., mapping DNNs to multi-accelerator SoCs, specialized hardware accelerators). These advancements are crucial for deploying AI in resource-constrained environments like embedded systems and IoT devices, reducing the environmental impact of AI, and enabling broader accessibility to AI applications.
Papers
From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference
Siddharth Samsi, Dan Zhao, Joseph McDonald, Baolin Li, Adam Michaleas, Michael Jones, William Bergeron, Jeremy Kepner, Devesh Tiwari, Vijay Gadepally
Enhancing Energy-efficiency by Solving the Throughput Bottleneck of LSTM Cells for Embedded FPGAs
Chao Qian, Tianheng Ling, Gregor Schiele