Local Inference
Local inference focuses on performing machine learning computations directly on devices, minimizing data transfer and latency. Current research emphasizes optimizing this process for various model architectures, including vision transformers (like EfficientViT-M2) for image classification and smaller language models for mobile applications, often employing techniques like federated learning and early exit networks to improve efficiency and accuracy. This approach is crucial for resource-constrained environments like IoT devices and satellites, enabling faster, more private, and energy-efficient AI applications while addressing challenges like data heterogeneity and communication costs.
Papers
September 5, 2024
July 23, 2024
July 12, 2024
June 11, 2024
May 7, 2024
May 2, 2024
June 5, 2023
April 23, 2023
April 7, 2023
November 29, 2021