Value Bias
Value bias in large language models (LLMs) refers to the tendency of these systems to favor options perceived as higher in value, even if statistically less likely, mirroring similar biases observed in humans. Current research focuses on identifying and quantifying this bias across various LLMs, such as GPT-4 and Llama, using methods like analyzing response choices in reward-maximization tasks and probing value content through psychological value theories. Understanding and mitigating value bias is crucial for ensuring the responsible deployment of LLMs, as it can significantly impact their fairness, reliability, and the ethical implications of their outputs in diverse applications.
Papers
February 16, 2024
January 25, 2024
April 7, 2023