Continuum Armed Bandit

Continuum-armed bandits address the challenge of sequentially selecting the best action from a continuous space to maximize cumulative reward, a problem arising in diverse fields like automated trading and network optimization. Current research focuses on developing algorithms, often employing Gaussian processes or Bayesian optimization, that efficiently balance exploration and exploitation within this continuous action space, particularly under constraints or non-stationary conditions. These advancements are improving the performance of online decision-making systems in applications where the action space is not discrete, leading to more efficient and adaptive solutions in various domains.

Papers