Optimal Policy
Optimal policy research focuses on finding the best course of action within a given system, often modeled as a Markov Decision Process (MDP), to maximize a desired outcome (e.g., reward, efficiency). Current research emphasizes developing efficient algorithms, such as policy gradient methods and diffusion models, to solve these problems, particularly in complex settings with high dimensionality or uncertainty, often incorporating techniques like variance reduction and bias correction. These advancements are significant for various fields, including robotics, finance, and AI, enabling improved decision-making in scenarios ranging from controlling robots to optimizing resource allocation. The development of more efficient and robust algorithms for finding optimal policies continues to be a central focus.
Papers
Stochastic Principal-Agent Problems: Efficient Computation and Learning
Jiarui Gan, Rupak Majumdar, Debmalya Mandal, Goran Radanovic
Fair and Robust Estimation of Heterogeneous Treatment Effects for Policy Learning
Kwangho Kim, José R. Zubizarreta
State Regularized Policy Optimization on Data with Dynamics Shift
Zhenghai Xue, Qingpeng Cai, Shuchang Liu, Dong Zheng, Peng Jiang, Kun Gai, Bo An
Offline Primal-Dual Reinforcement Learning for Linear MDPs
Germano Gabbianelli, Gergely Neu, Nneka Okolo, Matteo Papini
Learning Optimal Policy for Simultaneous Machine Translation via Binary Search
Shoutao Guo, Shaolei Zhang, Yang Feng
Offline Reinforcement Learning with Additional Covering Distributions
Chenjie Mao