Linear MDPs

Linear Markov Decision Processes (Linear MDPs) are a simplified model of reinforcement learning problems where the transition dynamics and reward functions are linear in a given feature space, enabling efficient algorithms despite potentially large state spaces. Current research focuses on developing computationally efficient algorithms with near-optimal regret bounds for various settings, including offline and online learning, aggregate bandit feedback, and imitation learning, often employing techniques like linear regression, optimistic value iteration, and primal-dual methods. These advancements are significant because they provide theoretically grounded and practically applicable solutions for reinforcement learning in complex environments, improving sample efficiency and addressing challenges like partial data coverage and adversarial settings.

Papers