Paper ID: 2210.00025

Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits

Siddhartha Banerjee, Sean R. Sinclair, Milind Tambe, Lily Xu, Christina Lee Yu

Most real-world deployments of bandit algorithms exist somewhere in between the offline and online set-up, where some historical data is available upfront and additional data is collected dynamically online. How best to incorporate historical data to "warm start" bandit algorithms is an open question: naively initializing reward estimates using all historical samples can suffer from spurious data and imbalanced data coverage, leading to computation and storage issues-particularly for continuous action spaces. To address these challenges, we propose Artificial-Replay, a meta-algorithm for incorporating historical data into any arbitrary base bandit algorithm. We show that Artificial-Replay uses only a fraction of the historical data compared to a full warm-start approach, while still achieving identical regret for base algorithms that satisfy independence of irrelevant data (IIData), a novel and broadly applicable property that we introduce. We complement these theoretical results with experiments on (i) K-armed bandits and (ii) continuous combinatorial bandits, on which we model green security domains using real poaching data. Our results show the practical benefits of Artificial-Replayin reducing computation and space complexity, including for base algorithms that do not satisfy IIData.

Submitted: Sep 30, 2022