Online Reinforcement Learning
Online reinforcement learning (RL) focuses on training agents to make optimal decisions in dynamic environments through continuous interaction and feedback. Current research emphasizes improving sample efficiency, particularly through offline data pre-training and techniques like prioritized experience replay and ensemble methods, as well as exploring novel model architectures such as Kolmogorov-Arnold Networks. These advancements aim to address challenges like reward sparsity, distribution shifts between offline and online data, and the need for safe and reliable learning in high-stakes applications such as robotics and healthcare, ultimately leading to more robust and efficient RL agents.
Papers
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration
Zhihan Liu, Miao Lu, Wei Xiong, Han Zhong, Hao Hu, Shenao Zhang, Sirui Zheng, Zhuoran Yang, Zhaoran Wang
Off-Policy RL Algorithms Can be Sample-Efficient for Continuous Control via Sample Multiple Reuse
Jiafei Lyu, Le Wan, Zongqing Lu, Xiu Li
Efficient Online Reinforcement Learning with Offline Data
Philip J. Ball, Laura Smith, Ilya Kostrikov, Sergey Levine
Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
Thomas Carta, Clément Romac, Thomas Wolf, Sylvain Lamprier, Olivier Sigaud, Pierre-Yves Oudeyer