Data Influence

Data influence research focuses on understanding how individual data points impact the training and performance of machine learning models, particularly large language and generative models. Current research emphasizes efficient methods for estimating data influence, often leveraging gradient-based approaches and low-rank approximations to reduce computational costs, and exploring diverse applications such as data selection, anomaly detection, and model debugging. This work is crucial for improving model interpretability, trustworthiness, and efficiency, as well as for developing techniques to mitigate issues like data memorization and bias.

Papers