Context Observation

Context observation in contextual bandit problems focuses on improving decision-making under uncertainty when the relevant contextual information is incomplete or noisy. Current research emphasizes developing algorithms, such as Thompson sampling and Upper Confidence Bound methods, that effectively balance exploration and exploitation despite imperfect context data, often modeling the context as a noisy linear function of unobserved variables. This area is significant because it addresses the limitations of traditional bandit algorithms in real-world scenarios where complete context information is rarely available, impacting fields like personalized recommendations and online advertising.

Papers

February 15, 2024

Thompson Sampling in Partially Observable Contextual Bandits
Hongju Park, Mohamad Kazem Shirani Faradonbeh
Contextual Bandit Regret Bound Thompson Sampling Contextual Information Bandit Policy Context Observation

July 26, 2023

Online learning in bandits with predicted context
Yongyi Guo, Ziping Xu, Susan Murphy
Context Information Online Learning Sublinear Regret Bandit Algorithm Contextual Bandit Problem Context Observation

July 28, 2022

Distributed Stochastic Bandit Learning with Delayed Context Observation
Jiabin Lin, Shana Moothedath
Contextual Bandit Cumulative Regret Confidence Bound Communication Constraint Armed Bandit Distributed Stochastic Context Observation

April 10, 2022

Worst-case Performance of Greedy Policies in Bandits with Imperfect Context Observations
Hongju Park, Mohamad Kazem Shirani Faradonbeh
Contextual Bandit Sequential Decision Bandit Algorithm Greedy Policy Worst Case Performance Context Observation

February 2, 2022

Efficient Algorithms for Learning to Control Bandits with Unobserved Contexts
Hongju Park, Mohamad Kazem Shirani Faradonbeh
LeArning Abstract Contextual Bandit Posterior Sampling Bandit Algorithm Efficient Algorithm Control Policy Unobserved Variable Context Vector Context Observation

Context Observation

Papers

Thompson Sampling in Partially Observable Contextual Bandits

Online learning in bandits with predicted context

Distributed Stochastic Bandit Learning with Delayed Context Observation

Worst-case Performance of Greedy Policies in Bandits with Imperfect Context Observations

Efficient Algorithms for Learning to Control Bandits with Unobserved Contexts