Device Inference

Device inference focuses on performing machine learning inference directly on resource-constrained devices like smartphones and embedded systems, aiming to improve speed, privacy, and energy efficiency compared to cloud-based solutions. Current research emphasizes efficient model architectures (e.g., quantized transformers, Mixture-of-Experts models), optimization techniques (e.g., pruning, knowledge distillation, integer quantization), and hybrid approaches combining on-device and cloud processing. This field is crucial for enabling the deployment of advanced AI capabilities in a wide range of applications, from mobile apps and IoT devices to autonomous systems, while addressing limitations in power, memory, and computational resources.

Papers