Policy Learning
Policy learning, a core area of reinforcement learning, aims to develop algorithms that enable agents to learn optimal decision-making strategies from data, often without explicit reward functions. Current research emphasizes improving sample efficiency and robustness, particularly in offline settings, using techniques like generative adversarial imitation learning (GAIL), transformer-based architectures, and model-based methods that incorporate world models or causal representations to handle noisy or incomplete data. These advancements are crucial for scaling reinforcement learning to complex real-world problems, such as robotics and personalized recommendations, where online learning is impractical or unsafe. The development of more efficient and robust policy learning algorithms has significant implications for various fields, improving the performance and generalizability of AI agents in diverse applications.
Papers
Exploiting Estimation Bias in Deep Double Q-Learning for Actor-Critic Methods
Alberto Sinigaglia, Niccolò Turcato, Alberto Dalla Libera, Ruggero Carli, Gian Antonio Susto
MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences
Souradip Chakraborty, Jiahao Qiu, Hui Yuan, Alec Koppel, Furong Huang, Dinesh Manocha, Amrit Singh Bedi, Mengdi Wang
Mastering Stacking of Diverse Shapes with Large-Scale Iterative Reinforcement Learning on Real Robots
Thomas Lampe, Abbas Abdolmaleki, Sarah Bechtle, Sandy H. Huang, Jost Tobias Springenberg, Michael Bloesch, Oliver Groth, Roland Hafner, Tim Hertweck, Michael Neunert, Markus Wulfmeier, Jingwei Zhang, Francesco Nori, Nicolas Heess, Martin Riedmiller
Aligning Human Intent from Imperfect Demonstrations with Confidence-based Inverse soft-Q Learning
Xizhou Bu, Wenjuan Li, Zhengxiong Liu, Zhiqiang Ma, Panfeng Huang