Inference Service

Inference services, which deliver predictions from machine learning models, are a critical component of many applications, aiming to optimize accuracy, latency, and cost. Current research focuses on improving efficiency through techniques like model cascades, dynamic modality selection, and optimized resource allocation across heterogeneous hardware, as well as enhancing privacy using secure multi-party computation and novel data protection methods. These advancements are crucial for deploying large language models and other complex AI systems in resource-constrained environments and sensitive applications, impacting both the scalability of AI and its responsible use.

Papers