Anomalous Token

Anomalous tokens, or tokens exhibiting unexpected behavior within large language models (LLMs), are a growing area of research focused on improving model reliability and trustworthiness. Current efforts concentrate on detecting these anomalies using techniques like low-confidence prediction analysis and clustering algorithms applied to attention patterns and embedding spaces, often within the context of specific model architectures such as Mixture-of-Experts (MoE). Identifying and mitigating the effects of anomalous tokens is crucial for enhancing the robustness and safety of LLMs, impacting both the development of more reliable AI systems and the broader understanding of their internal workings.

Papers