Latent Bandit
Latent bandits address sequential decision-making problems where an unobserved latent state influences the reward distribution for each action. Research focuses on developing algorithms that efficiently learn these latent states from data, often leveraging offline information to improve online performance, with models ranging from linear to more complex nonlinear structures, including autoregressive processes and clustering techniques. This framework is particularly relevant for personalized applications like recommender systems and healthcare, where unobserved heterogeneity among individuals necessitates efficient exploration and exploitation strategies to maximize cumulative rewards while addressing challenges like cold starts and privacy concerns. Current work emphasizes achieving low regret and high accuracy, often incorporating matrix completion or other collaborative filtering methods to improve efficiency and performance.