Offline Contextual Bandit
Offline contextual bandits address the challenge of learning optimal decision-making policies from pre-collected data, without the ability to actively explore. Current research focuses on developing robust algorithms that handle limited data, confounding variables, and model misspecification, often employing techniques like pessimism, confidence bounds, and convex optimization within linear or neural network models. This field is significant because it enables effective policy learning in scenarios where online exploration is infeasible or costly, with applications ranging from personalized recommendations to resource allocation in complex systems like wireless networks.
Papers
May 27, 2024
February 24, 2024
September 21, 2023
July 24, 2023
June 2, 2023
March 20, 2023
February 26, 2023
November 29, 2022
October 24, 2022
May 26, 2022
May 21, 2022
November 27, 2021