Training Data Attribution
Training data attribution (TDA) aims to identify which specific training data points most influence a model's predictions, addressing concerns about model transparency, intellectual property, and bias. Current research focuses on improving the accuracy and efficiency of TDA methods, particularly for large language models and diffusion models, employing techniques like influence functions, in-context learning, and ensemble methods to overcome challenges posed by model complexity and training dynamics. This work is crucial for enhancing the explainability and trustworthiness of AI systems, with implications for areas such as copyright protection, bias mitigation, and debugging model inaccuracies.
Papers
October 22, 2024
October 9, 2024
October 2, 2024
September 25, 2024
September 24, 2024
August 14, 2024
May 27, 2024
May 20, 2024
April 1, 2024
February 6, 2024
January 25, 2024
November 20, 2023
October 31, 2023
June 3, 2023
May 31, 2023
March 14, 2023
November 25, 2022