Local Inference

Local inference focuses on performing machine learning computations directly on devices, minimizing data transfer and latency. Current research emphasizes optimizing this process for various model architectures, including vision transformers (like EfficientViT-M2) for image classification and smaller language models for mobile applications, often employing techniques like federated learning and early exit networks to improve efficiency and accuracy. This approach is crucial for resource-constrained environments like IoT devices and satellites, enabling faster, more private, and energy-efficient AI applications while addressing challenges like data heterogeneity and communication costs.

Papers