Biased Feature

Biased features in machine learning models refer to spurious correlations in training data that lead models to rely on irrelevant attributes (e.g., race, gender) for predictions, rather than task-relevant information. Current research focuses on identifying and mitigating these biases through various techniques, including adversarial training, feature orthogonalization, and dataset refinement methods applied to diverse model architectures like CNNs and LLMs. Addressing biased features is crucial for ensuring fairness, improving model generalization, and building trustworthy AI systems across various applications, from image recognition to natural language processing.

Papers