Paper ID: 2407.12391

LLM Inference Serving: Survey of Recent Advances and Opportunities

Baolin Li, Yankai Jiang, Vijay Gadepally, Devesh Tiwari

This survey offers a comprehensive overview of recent advancements in Large Language Model (LLM) serving systems, focusing on research since the year 2023. We specifically examine system-level enhancements that improve performance and efficiency without altering the core LLM decoding mechanisms. By selecting and reviewing high-quality papers from prestigious ML and system venues, we highlight key innovations and practical considerations for deploying and scaling LLMs in real-world production environments. This survey serves as a valuable resource for LLM practitioners seeking to stay abreast of the latest developments in this rapidly evolving field.

Submitted: Jul 17, 2024

Topics

Large Language Model
Machine Learning
Timely Survey
Medical LLM
Emerging Opportunity
Recent Advance
Large Language Model Inference

Links

arXiv PDF