Next Token Prediction
Next-token prediction (NTP) is a machine learning technique where models predict the probability distribution of the next token in a sequence, primarily used to train large language models (LLMs). Current research focuses on improving NTP's efficiency and effectiveness through architectural innovations like encoder-only transformers and algorithmic enhancements such as multi-token prediction and selective language modeling, aiming to mitigate issues like memorization and hallucinations. The widespread use of NTP in training LLMs makes understanding its limitations and optimizing its performance crucial for advancing both the theoretical understanding of LLMs and their practical applications in various fields.
Papers
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Nikhil Bhendawade, Irina Belousova, Qichen Fu, Henry Mason, Mohammad Rastegari, Mahyar Najibi
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains
Benjamin L. Edelman, Ezra Edelman, Surbhi Goel, Eran Malach, Nikolaos Tsilivis
Symbolic Autoencoding for Self-Supervised Sequence Learning
Mohammad Hossein Amani, Nicolas Mario Baldwin, Amin Mansouri, Martin Josifoski, Maxime Peyrard, Robert West