Token Bias

Token bias in large language models (LLMs) refers to the disproportionate influence of specific tokens (sub-word units) on model outputs, leading to inaccurate or unfair predictions. Current research focuses on identifying and mitigating this bias through various techniques, including recalibrating automated evaluators, developing unbiased tokenization algorithms, and employing methods like contrastive clustering to disentangle causal and correlational relationships between tokens and model predictions. Addressing token bias is crucial for improving the reliability, fairness, and generalizability of LLMs across diverse applications, ranging from sentiment analysis and recommendation systems to fact-checking and toxic language detection.

Papers