Token Prediction
Token prediction, the task of predicting the next word (or token) in a sequence, is central to many natural language processing (NLP) applications and underpins the functionality of large language models (LLMs). Current research focuses on improving prediction accuracy, particularly for long-range dependencies and in the presence of misinformation or adversarial inputs, exploring techniques like planning tokens, divergence-based calibration, and adaptive decoding methods to enhance efficiency and robustness. These advancements are crucial for building more reliable and efficient LLMs, impacting various fields from question answering and text generation to code completion and image synthesis.
Papers
Towards Opening the Black Box of Neural Machine Translation: Source and Target Interpretations of the Transformer
Javier Ferrando, Gerard I. Gállego, Belen Alastruey, Carlos Escolano, Marta R. Costa-jussà
Outliers Dimensions that Disrupt Transformers Are Driven by Frequency
Giovanni Puccetti, Anna Rogers, Aleksandr Drozd, Felice Dell'Orletta