Paper ID: 2202.06374

Holdouts set for predictive model updating

Sami Haidar-Wehbe, Samuel R Emerson, Louis J M Aslett, James Liley

In complex settings, such as healthcare, predictive risk scores play an increasingly crucial role in guiding interventions. However, directly updating risk scores used to guide intervention can lead to biased risk estimates. To address this, we propose updating using a `holdout set' - a subset of the population that does not receive interventions guided by the risk score. Striking a balance in the size of the holdout set is essential, to ensure good performance of the updated risk score whilst minimising the number of held out samples. We prove that this approach enables total costs to grow at a rate $O\left(N^{2/3}\right)$ for a population of size $N$, and argue that in general circumstances there is no competitive alternative. By defining an appropriate loss function, we describe conditions under which an optimal holdout size (OHS) can be readily identified, and introduce parametric and semi-parametric algorithms for OHS estimation, demonstrating their use on a recent risk score for pre-eclampsia. Based on these results, we make the case that a holdout set is a safe, viable and easily implemented means to safely update predictive risk scores.

Submitted: Feb 13, 2022