Inference Stage
The inference stage in large language models (LLMs) focuses on optimizing the process of generating outputs from input prompts, aiming to reduce computational cost and improve efficiency without sacrificing performance. Current research emphasizes techniques like adaptive layer selection to reduce unnecessary computations and joint optimization of training and inference in federated learning settings to maximize accuracy on resource-constrained devices. These advancements are crucial for deploying LLMs in real-world applications, addressing concerns about computational expense and improving robustness against attacks like jailbreaking and backdoors.
Papers
October 31, 2024
October 21, 2024
October 9, 2024
March 4, 2024
December 20, 2023
November 15, 2023
March 27, 2023