Fairness Preference

Fairness preferences in artificial intelligence focus on aligning AI systems' decision-making with human notions of fairness, addressing biases that can lead to discriminatory outcomes. Current research investigates how to incorporate diverse human fairness preferences into AI models, often using techniques like prompt optimization for large language models or adversarial training to mitigate biases in classifiers. This work is crucial for building trustworthy AI systems, particularly in sensitive applications like content moderation and healthcare, by ensuring equitable treatment across different demographic groups and reducing the risk of algorithmic discrimination.

Papers