Implicit Bias
Implicit bias refers to unintended, often subtle, biases embedded within machine learning models, stemming from biases present in their training data. Current research focuses on detecting and mitigating these biases in various model architectures, particularly large language models (LLMs) and deep neural networks, using techniques like prompt engineering, fine-tuning, and Bayesian methods. Understanding and addressing implicit bias is crucial for ensuring fairness and equity in AI applications, impacting fields ranging from healthcare and criminal justice to education and hiring. The development of robust bias detection and mitigation strategies is a central goal of ongoing research.
Papers
Relating Implicit Bias and Adversarial Attacks through Intrinsic Dimension
Lorenzo Basile, Nikos Karantzas, Alberto D'Onofrio, Luca Bortolussi, Alex Rodriguez, Fabio Anselmi
Getting Sick After Seeing a Doctor? Diagnosing and Mitigating Knowledge Conflicts in Event Temporal Reasoning
Tianqing Fang, Zhaowei Wang, Wenxuan Zhou, Hongming Zhang, Yangqiu Song, Muhao Chen