Autoregressive Decoding
Autoregressive decoding, the sequential generation of text tokens in large language models (LLMs), is a computationally expensive process limiting the speed and scalability of LLMs. Current research focuses on accelerating this process through methods like speculative decoding, which involves parallel generation of multiple token candidates followed by verification, and alternative decoding strategies such as non-autoregressive or semi-autoregressive approaches. These advancements aim to improve inference speed without sacrificing generation quality, impacting various applications from machine translation to code generation by enabling faster and more efficient deployment of LLMs.
Papers
December 10, 2024
October 22, 2024
October 4, 2024
June 25, 2024
June 19, 2024
February 26, 2024
February 7, 2024
February 3, 2024
January 26, 2024
January 15, 2024
November 14, 2023
July 5, 2023
May 17, 2023
May 9, 2023
May 2, 2023
March 30, 2023
February 26, 2023
February 10, 2023