Contextual Bandit Algorithm
Contextual bandit algorithms optimize sequential decision-making by learning to select actions that maximize rewards based on observed contextual information. Current research emphasizes extending these algorithms to handle diverse data types (e.g., count data, relative feedback), complex model structures (e.g., generalized linear models, neural networks), and challenging real-world constraints (e.g., partial observability, domain adaptation, privacy). This active area of research is crucial for improving personalized systems in various fields, including healthcare, recommendation systems, and online advertising, by enabling more efficient and robust learning from user interactions.
Papers
March 30, 2022
February 2, 2022
December 28, 2021
December 11, 2021