Implicit Bias
Implicit bias refers to unintended, often subtle, biases embedded within machine learning models, stemming from biases present in their training data. Current research focuses on detecting and mitigating these biases in various model architectures, particularly large language models (LLMs) and deep neural networks, using techniques like prompt engineering, fine-tuning, and Bayesian methods. Understanding and addressing implicit bias is crucial for ensuring fairness and equity in AI applications, impacting fields ranging from healthcare and criminal justice to education and hiring. The development of robust bias detection and mitigation strategies is a central goal of ongoing research.
Papers
Toward Automated Detection of Biased Social Signals from the Content of Clinical Conversations
Feng Chen, Manas Satish Bedmutha, Ray-Yuan Chung, Janice Sabin, Wanda Pratt, Brian R. Wood, Nadir Weibel, Andrea L. Hartzler, Trevor Cohen
The African Woman is Rhythmic and Soulful: Evaluation of Open-ended Generation for Implicit Biases
Serene Lim
Modeling Human Subjectivity in LLMs Using Explicit and Implicit Human Factors in Personas
Salvatore Giorgi, Tingting Liu, Ankit Aich, Kelsey Isman, Garrick Sherman, Zachary Fried, João Sedoc, Lyle H. Ungar, Brenda Curtis
Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective
Yuchen Wen, Keping Bi, Wei Chen, Jiafeng Guo, Xueqi Cheng