Submodular Bandit

Submodular bandits address online decision-making problems where rewards exhibit diminishing returns, a property captured by submodular functions. Research focuses on developing efficient algorithms, such as variations of greedy and policy gradient methods, to maximize cumulative rewards under bandit feedback (where only the chosen action's reward is observed) and often incorporating constraints like matroids. These algorithms aim to minimize regret, the difference between achieved and optimal rewards, and find applications in multi-robot coordination, experimental design, and resource allocation, improving performance in complex, dynamic environments. Current work explores extensions to handle delayed or untrustworthy feedback, and contextual information to further enhance the robustness and applicability of these methods.

Papers