Paper ID: 2406.16738

Inducing Group Fairness in LLM-Based Decisions

James Atwood, Preethi Lahoti, Ananth Balashankar, Flavien Prost, Ahmad Beirami

Prompting Large Language Models (LLMs) has created new and interesting means for classifying textual data. While evaluating and remediating group fairness is a well-studied problem in classifier fairness literature, some classical approaches (e.g., regularization) do not carry over, and some new opportunities arise (e.g., prompt-based remediation). We measure fairness of LLM-based classifiers on a toxicity classification task, and empirically show that prompt-based classifiers may lead to unfair decisions. We introduce several remediation techniques and benchmark their fairness and performance trade-offs. We hope our work encourages more research on group fairness in LLM-based classifiers.

Submitted: Jun 24, 2024