Influential Data
Influential data research focuses on identifying and leveraging subsets of data that disproportionately impact model performance or causal inference. Current efforts concentrate on developing algorithms, like quality-aware diverse selection strategies, to efficiently identify these subsets across diverse applications, including mathematical reasoning and causal effect estimation using instrumental variables. This research is crucial for improving model efficiency, enhancing the reliability of causal analyses, and optimizing data collection strategies in various fields, from autonomous vehicles to privacy policy analysis. The development of robust statistical guarantees for influence diagnostics further strengthens the reliability and applicability of these methods.