Data Shapley

Data Shapley is a method for evaluating the individual contribution of each data point to a machine learning model's performance, drawing on concepts from cooperative game theory. Current research focuses on improving the computational efficiency of Data Shapley, particularly for large datasets and complex models, through techniques like approximating Shapley values with a single model training run or leveraging specific algorithm properties (e.g., K-Nearest Neighbors). This work aims to enhance the trustworthiness and explainability of machine learning by providing a principled way to assess data value, with implications for data markets, data selection, and model training.

Papers