Inference Stage

The inference stage in large language models (LLMs) focuses on optimizing the process of generating outputs from input prompts, aiming to reduce computational cost and improve efficiency without sacrificing performance. Current research emphasizes techniques like adaptive layer selection to reduce unnecessary computations and joint optimization of training and inference in federated learning settings to maximize accuracy on resource-constrained devices. These advancements are crucial for deploying LLMs in real-world applications, addressing concerns about computational expense and improving robustness against attacks like jailbreaking and backdoors.

Papers