Training Data Attribution

Training data attribution (TDA) aims to identify which specific training data points most influence a model's predictions, addressing concerns about model transparency, intellectual property, and bias. Current research focuses on improving the accuracy and efficiency of TDA methods, particularly for large language models and diffusion models, employing techniques like influence functions, in-context learning, and ensemble methods to overcome challenges posed by model complexity and training dynamics. This work is crucial for enhancing the explainability and trustworthiness of AI systems, with implications for areas such as copyright protection, bias mitigation, and debugging model inaccuracies.

Papers