K$ Draft
"Drafting" in the context of large language models (LLMs) refers to the generation of preliminary text outputs, often used to accelerate inference or improve the quality of final text. Current research focuses on developing efficient drafting algorithms, such as self-speculative decoding and blockwise parallel decoding, to generate multiple drafts quickly and improve their coherence and accuracy. These advancements have significant implications for various applications, including faster LLM inference, improved clinical note generation, and enhanced software development workflows, ultimately increasing efficiency and productivity in diverse fields.
Papers
November 17, 2024
October 1, 2024
August 7, 2024
June 24, 2024
May 28, 2024
April 27, 2024
April 14, 2024
September 15, 2023
December 13, 2022
October 21, 2022