Stereotype Content
Stereotype content research investigates how biases and stereotypes are represented and perpetuated within large language models (LLMs) and other AI systems, aiming to understand and mitigate their harmful societal impact. Current research focuses on identifying and quantifying these biases across various modalities (text, images), languages, and demographic groups, often employing techniques like adversarial attacks and explainable AI methods to analyze model behavior and develop mitigation strategies. This work is crucial for ensuring fairness and equity in AI applications, impacting fields ranging from education and healthcare to hiring and criminal justice, by promoting the development of less biased and more responsible AI systems.
Papers
Seeds of Stereotypes: A Large-Scale Textual Analysis of Race and Gender Associations with Diseases in Online Sources
Lasse Hyldig Hansen, Nikolaj Andersen, Jack Gallifant, Liam G. McCoy, James K Stone, Nura Izath, Marcela Aguirre-Jerez, Danielle S Bitterman, Judy Gichoya, Leo Anthony Celi
BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models
Chu Fei Luo, Ahmad Ghawanmeh, Xiaodan Zhu, Faiza Khan Khattak