Data Attribution

Data attribution aims to identify which training data points most influence a machine learning model's outputs, enabling fairer compensation for data providers and improved model understanding. Current research focuses on developing computationally efficient attribution methods for various model architectures, including diffusion models and neural networks, often employing techniques like Shapley values, surrogate modeling, and gradient-based approaches. This field is crucial for enhancing model transparency, accountability, and fairness, with implications for intellectual property rights, data valuation, and the responsible development of AI systems. Addressing adversarial attacks on attribution methods is also a growing area of concern.

Papers