Optimal Policy
Optimal policy research focuses on finding the best course of action within a given system, often modeled as a Markov Decision Process (MDP), to maximize a desired outcome (e.g., reward, efficiency). Current research emphasizes developing efficient algorithms, such as policy gradient methods and diffusion models, to solve these problems, particularly in complex settings with high dimensionality or uncertainty, often incorporating techniques like variance reduction and bias correction. These advancements are significant for various fields, including robotics, finance, and AI, enabling improved decision-making in scenarios ranging from controlling robots to optimizing resource allocation. The development of more efficient and robust algorithms for finding optimal policies continues to be a central focus.
Papers
Decentralized Multi-Agent Reinforcement Learning for Continuous-Space Stochastic Games
Awni Altabaa, Bora Yongacoglu, Serdar Yüksel
Recommending the optimal policy by learning to act from temporal data
Stefano Branchi, Andrei Buliga, Chiara Di Francescomarino, Chiara Ghidini, Francesca Meneghello, Massimiliano Ronzani
Eliciting User Preferences for Personalized Multi-Objective Decision Making through Comparative Feedback
Han Shao, Lee Cohen, Avrim Blum, Yishay Mansour, Aadirupa Saha, Matthew R. Walter
Layered State Discovery for Incremental Autonomous Exploration
Liyu Chen, Andrea Tirinzoni, Alessandro Lazaric, Matteo Pirotta
Population-size-Aware Policy Optimization for Mean-Field Games
Pengdeng Li, Xinrun Wang, Shuxin Li, Hau Chan, Bo An