Dataset Multiplicity

Dataset multiplicity examines how inaccuracies and biases within training data affect the reliability and fairness of machine learning model predictions. Current research focuses on understanding how various sources of data imperfection, such as noisy labels and systemic biases, propagate through the model training process, impacting different demographic groups disproportionately. This research aims to develop methods, like data diversification techniques, to mitigate these issues and improve model robustness and fairness. The ultimate goal is to build more reliable and trustworthy AI systems by addressing the fundamental limitations imposed by imperfect training data.

Papers