Draft Tree
Draft trees are a key component of speculative decoding, a technique accelerating large language model (LLM) inference by generating multiple candidate token sequences ("drafting") before verifying them with the full LLM. Current research focuses on optimizing draft tree structures, moving from static, heuristic designs towards dynamic, context-aware approaches that adapt to the specific input and model characteristics to maximize the number of correctly predicted tokens. These advancements significantly improve inference speed, offering substantial benefits for deploying LLMs in resource-constrained environments and enabling faster processing of large text datasets.
Papers
August 30, 2024
June 25, 2024
June 24, 2024