Implicit Bias
Implicit bias refers to unintended, often subtle, biases embedded within machine learning models, stemming from biases present in their training data. Current research focuses on detecting and mitigating these biases in various model architectures, particularly large language models (LLMs) and deep neural networks, using techniques like prompt engineering, fine-tuning, and Bayesian methods. Understanding and addressing implicit bias is crucial for ensuring fairness and equity in AI applications, impacting fields ranging from healthcare and criminal justice to education and hiring. The development of robust bias detection and mitigation strategies is a central goal of ongoing research.
Papers
Modeling Human Subjectivity in LLMs Using Explicit and Implicit Human Factors in Personas
Salvatore Giorgi, Tingting Liu, Ankit Aich, Kelsey Isman, Garrick Sherman, Zachary Fried, João Sedoc, Lyle H. Ungar, Brenda Curtis
Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective
Yuchen Wen, Keping Bi, Wei Chen, Jiafeng Guo, Xueqi Cheng