Optimal Policy
Optimal policy research focuses on finding the best course of action within a given system, often modeled as a Markov Decision Process (MDP), to maximize a desired outcome (e.g., reward, efficiency). Current research emphasizes developing efficient algorithms, such as policy gradient methods and diffusion models, to solve these problems, particularly in complex settings with high dimensionality or uncertainty, often incorporating techniques like variance reduction and bias correction. These advancements are significant for various fields, including robotics, finance, and AI, enabling improved decision-making in scenarios ranging from controlling robots to optimizing resource allocation. The development of more efficient and robust algorithms for finding optimal policies continues to be a central focus.
Papers
Near-optimality for infinite-horizon restless bandits with many arms
Xiangyu Zhang, Peter I. Frazier
Neural representation of a time optimal, constant acceleration rendezvous
Dario Izzo, Sebastien Origer
Learning to act: a Reinforcement Learning approach to recommend the best next activities
Stefano Branchi, Chiara Di Francescomarino, Chiara Ghidini, David Massimo, Francesco Ricci, Massimiliano Ronzani
Offline Reinforcement Learning Under Value and Density-Ratio Realizability: The Power of Gaps
Jinglin Chen, Nan Jiang
A Conservative Q-Learning approach for handling distribution shift in sepsis treatment strategies
Pramod Kaushik, Sneha Kummetha, Perusha Moodley, Raju S. Bapi
Randomized Policy Optimization for Optimal Stopping
Xinyi Guan, Velibor V. Mišić
Semi-Markov Offline Reinforcement Learning for Healthcare
Mehdi Fatemi, Mary Wu, Jeremy Petch, Walter Nelson, Stuart J. Connolly, Alexander Benz, Anthony Carnicelli, Marzyeh Ghassemi
Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs
Andrea Tirinzoni, Aymen Al-Marjani, Emilie Kaufmann