Paper ID: 2410.08578

Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit

Julien Zhou (Thoth, STATIFY), Pierre Gaillard (Thoth), Thibaud Rahier, Julyan Arbel (STATIFY)

We address the online unconstrained submodular maximization problem (Online USM), in a setting with stochastic bandit feedback. In this framework, a decision-maker receives noisy rewards from a nonmonotone submodular function, taking values in a known bounded interval. This paper proposes Double-Greedy - Explore-then-Commit (DG-ETC), adapting the Double-Greedy approach from the offline and online full-information settings. DG-ETC satisfies a O(d log(dT)) problemdependent upper bound for the 1/2-approximate pseudo-regret, as well as a O(dT^{2/3}log(dT)^{1/3}) problem-free one at the same time, outperforming existing approaches. To that end, we introduce a notion of hardness for submodular functions, characterizing how difficult it is to maximize them with this type of strategy.

Submitted: Oct 11, 2024