Bias Detection Method

Bias detection in AI models, particularly large language models (LLMs) and image generation models, aims to identify and quantify unfair biases stemming from training data, thereby improving fairness and safety. Current research focuses on developing open-set detection methods that go beyond pre-defined biases, employing techniques like gradient-based analysis, vision question answering, and comparisons to authoritative datasets (e.g., US labor statistics). These advancements are crucial for mitigating the societal impact of biased AI systems across various applications, from recruitment to criminal justice, by providing tools for identifying and addressing discriminatory outputs.

Papers