Paper ID: 2405.12372
DispaRisk: Auditing Fairness Through Usable Information
Jonathan Vasquez, Carlotta Domeniconi, Huzefa Rangwala
Machine Learning algorithms (ML) impact virtually every aspect of human lives and have found use across diverse sectors including healthcare, finance, and education. Often, ML algorithms have been found to exacerbate societal biases present in datasets leading to adversarial impacts on subsets/groups of individuals and in many cases on minority groups. To effectively mitigate these untoward effects, it is crucial that disparities/biases are identified early in a ML pipeline. This proactive approach facilitates timely interventions to prevent bias amplification and reduce complexity at later stages of model development. In this paper, we leverage recent advancements in usable information theory to introduce DispaRisk, a novel framework designed to proactively assess the potential risks of disparities in datasets during the initial stages of the ML pipeline. We evaluate DispaRisk's effectiveness by benchmarking it against commonly used datasets in fairness research. Our findings demonstrate DispaRisk's capabilities to identify datasets with a high risk of discrimination, detect model families prone to biases within an ML pipeline, and enhance the explainability of these bias risks. This work contributes to the development of fairer ML systems by providing a robust tool for early bias detection and mitigation. The code for our experiments is available in the following repository: this https URL
Submitted: May 20, 2024