LLM Bias
Large language models (LLMs) often exhibit biases reflecting societal prejudices, leading to unfair or discriminatory outputs. Current research focuses on developing methods to detect and mitigate these biases, encompassing both implicit and explicit forms across various protected attributes like race, gender, and age, using techniques such as prompt engineering, attention mechanism analysis, and counterfactual evaluations applied to models like GPT-3.5 and others. Understanding and addressing LLM bias is crucial for ensuring fairness and ethical deployment of these powerful technologies, impacting both the development of responsible AI and the avoidance of harmful societal consequences.
Papers
Hey GPT, Can You be More Racist? Analysis from Crowdsourced Attempts to Elicit Biased Content from Generative AI
Hangzhi Guo, Pranav Narayanan Venkit, Eunchae Jang, Mukund Srinath, Wenbo Zhang, Bonam Mingole, Vipul Gupta, Kush R. Varshney, S. Shyam Sundar, Amulya Yadav
A Novel Interpretability Metric for Explaining Bias in Language Models: Applications on Multilingual Models from Southeast Asia
Lance Calvin Lim Gamboa, Mark Lee