Paper ID: 2408.04666 • Published Aug 6, 2024

LLMs are Not Just Next Token Predictors

Stephen M. Downes, Patrick Forber, Alex Grzankowski
TL;DR
Get AI-generated summaries with premium
Get AI-generated summaries with premium
LLMs are statistical models of language learning through stochastic gradient descent with a next token prediction objective. Prompting a popular view among AI modelers: LLMs are just next token predictors. While LLMs are engineered using next token prediction, and trained based on their success at this task, our view is that a reduction to just next token predictor sells LLMs short. Moreover, there are important explanations of LLM behavior and capabilities that are lost when we engage in this kind of reduction. In order to draw this out, we will make an analogy with a once prominent research program in biology explaining evolution and development from the gene's eye view.