Logistic Bandit

Logistic bandits are a class of online learning problems where a learning agent sequentially selects actions, receiving binary feedback (e.g., click/no-click) modeled by a logistic function. Current research focuses on developing algorithms that minimize regret—the difference between the rewards obtained and those of an optimal strategy—with a particular emphasis on improving computational efficiency while maintaining statistically optimal regret bounds, often using techniques like optimistic algorithms and confidence sequences. This area is significant due to its broad applicability in areas such as online advertising and recommendation systems, where efficient and accurate modeling of user choices is crucial.

Papers