Inference Time
Inference time, the time taken for a model to process an input and produce an output, is a critical factor in the performance and scalability of large language models (LLMs) and other deep learning systems. Current research focuses on optimizing inference efficiency through techniques like adaptive sampling, architecture search for efficient inference-time techniques, and model compression methods, aiming to reduce computational costs without sacrificing accuracy. These advancements are crucial for deploying LLMs in resource-constrained environments and improving the responsiveness of AI applications, impacting both the efficiency of AI systems and their accessibility to a wider range of users.
Papers
October 26, 2023
October 24, 2023
September 26, 2023
August 3, 2023
August 2, 2023
July 5, 2023
June 13, 2023
May 27, 2023
January 10, 2023
December 25, 2022
December 8, 2022
November 28, 2022
October 10, 2022
August 1, 2022
June 30, 2022
June 1, 2022
May 27, 2022
March 16, 2022
March 10, 2022
December 15, 2021