Delayed Bandit
Delayed bandit problems address the challenge of online decision-making where feedback on actions is received only after a delay, hindering immediate learning and optimization. Current research focuses on developing algorithms that efficiently handle these delays, often employing techniques like blocking updates or incorporating intermediate observations to mitigate the impact of delayed feedback, with a strong emphasis on achieving near-optimal regret bounds. This research is significant because it improves the theoretical understanding and practical applicability of online learning algorithms in various real-world scenarios where delayed feedback is inherent, such as reinforcement learning and online advertising.
Papers
February 14, 2024
May 30, 2023
May 13, 2023
March 23, 2023
January 25, 2023
June 1, 2022
May 17, 2022
January 31, 2022