Linear Reward

Linear reward models are central to many online learning problems, aiming to optimize decision-making by learning the relationship between actions (features) and their resulting rewards. Current research focuses on improving efficiency and robustness in various settings, including sparse data, adversarial environments, and privacy constraints, often employing algorithms like contextual bandits and Thompson sampling, along with techniques to handle non-linearity and high dimensionality. These advancements have significant implications for applications such as personalized recommendations, traffic routing, and autonomous driving, where efficient and reliable learning from limited or noisy data is crucial.

Papers