Logistic Bandit
Logistic bandits are a class of online learning problems where a learning agent sequentially selects actions, receiving binary feedback (e.g., click/no-click) modeled by a logistic function. Current research focuses on developing algorithms that minimize regret—the difference between the rewards obtained and those of an optimal strategy—with a particular emphasis on improving computational efficiency while maintaining statistically optimal regret bounds, often using techniques like optimistic algorithms and confidence sequences. This area is significant due to its broad applicability in areas such as online advertising and recommendation systems, where efficient and accurate modeling of user choices is crucial.
Papers
October 28, 2024
July 19, 2024
July 8, 2024
May 16, 2024
February 12, 2024
October 28, 2023
May 27, 2022
April 22, 2022
February 4, 2022