Autoregressive Large Language Model

Autoregressive large language models (LLMs) generate text by predicting the next token in a sequence, leveraging massive datasets and transformer architectures. Current research focuses on improving efficiency through techniques like low-rank compression of key-value caches, speculative decoding, and adaptive layer skipping, while also addressing limitations such as long-context processing and the inherent sequential nature of autoregressive generation. These advancements are significant because they enhance the speed, memory efficiency, and capabilities of LLMs, impacting various applications from video generation to scientific text summarization and potentially even influencing policy-making processes. Furthermore, ongoing work explores the theoretical foundations of LLMs, including their computational universality and the coherence of their probabilistic judgments.

Papers