Non Stationary Multi Armed Bandit
Non-stationary multi-armed bandit (NS-MAB) problems address the challenge of making sequential decisions in environments where the reward probabilities change over time. Current research focuses on developing algorithms, such as discounted Thompson sampling and variations of epsilon-greedy approaches, that effectively balance exploration (trying different options) and exploitation (choosing the currently best option) in these dynamic settings, often incorporating mechanisms to detect and adapt to shifts in reward distributions. These methods find applications in diverse fields, including robotics, payment systems, and multi-agent game theory, improving efficiency and performance in scenarios with evolving conditions. The development of theoretically sound and practically effective NS-MAB algorithms is a significant area of ongoing research, with a focus on achieving optimal regret bounds under various assumptions about the nature of the environmental changes.