Bias Neuron

Bias neurons are specific neural network units identified as contributing disproportionately to unfair or biased model outputs across various applications, including speech recognition, natural language processing, and image classification. Current research focuses on detecting these neurons using techniques like integrated gradient methods and mitigating their effects through methods such as neuron suppression, bias potential manipulation, and data preprocessing (e.g., sketching). Understanding and addressing bias neurons is crucial for building fairer and more trustworthy AI systems, impacting fields ranging from healthcare to social justice by reducing algorithmic discrimination and improving model reliability.

Papers