Paper ID: 2502.06011 • Published Feb 9, 2025
Uncertainty Quantification and Causal Considerations for Off-Policy Decision Making
Muhammad Faaiz Taufiq
TL;DR
Get AI-generated summaries with premium
Get AI-generated summaries with premium
Off-policy evaluation (OPE) is a critical challenge in robust decision-making
that seeks to assess the performance of a new policy using data collected under
a different policy. However, the existing OPE methodologies suffer from several
limitations arising from statistical uncertainty as well as causal
considerations. In this thesis, we address these limitations by presenting
three different works. Firstly, we consider the problem of high variance in the
importance-sampling-based OPE estimators. We introduce the Marginal Ratio (MR)
estimator, a novel OPE method that reduces variance by focusing on the marginal
distribution of outcomes rather than direct policy shifts, improving robustness
in contextual bandits. Next, we propose Conformal Off-Policy Prediction (COPP),
a principled approach for uncertainty quantification in OPE that provides
finite-sample predictive intervals, ensuring robust decision-making in
risk-sensitive applications. Finally, we address causal unidentifiability in
off-policy decision-making by developing novel bounds for sequential decision
settings, which remain valid under arbitrary unmeasured confounding. We apply
these bounds to assess the reliability of digital twin models, introducing a
falsification framework to identify scenarios where model predictions diverge
from real-world behaviour. Our contributions provide new insights into robust
decision-making under uncertainty and establish principled methods for
evaluating policies in both static and dynamic settings.