Next Token

Next-token prediction (NTP) is a dominant training paradigm for large language models (LLMs), aiming to predict the next word or token in a sequence. Current research focuses on improving NTP's effectiveness by addressing limitations like shortcut learning and insufficient planning capabilities, often employing transformer architectures and exploring novel training objectives such as horizon-length prediction and diffusion forcing. These advancements aim to enhance LLMs' ability to generate coherent and contextually relevant text, impacting various applications from code generation and autonomous driving to humanoid robotics and visual processing.

Papers