Reinforcement Learning
Reinforcement learning (RL) focuses on training agents to make optimal decisions in an environment by learning through trial and error, aiming to maximize cumulative rewards. Current research emphasizes improving RL's efficiency and robustness, particularly in areas like human-in-the-loop training (e.g., using human feedback to refine models), handling uncertainty and sparse rewards, and scaling to complex tasks (e.g., robotics, autonomous driving). Prominent approaches involve various policy gradient methods, Monte Carlo Tree Search, and the integration of large language models for improved decision-making and task decomposition. These advancements are driving progress in diverse fields, including robotics, game playing, and the development of more human-aligned AI systems.
Papers
Enhancing Reinforcement Learning Through Guided Search
Jérôme Arjonilla, Abdallah Saffidine, Tristan Cazenave
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
Sriyash Poddar, Yanming Wan, Hamish Ivison, Abhishek Gupta, Natasha Jaques
Efficient Exploration in Deep Reinforcement Learning: A Novel Bayesian Actor-Critic Algorithm
Nikolai Rozanov
The Exploration-Exploitation Dilemma Revisited: An Entropy Perspective
Renye Yan, Yaozhong Gan, You Wu, Ling Liang, Junliang Xing, Yimao Cai, Ru Huang
Demystifying Reinforcement Learning in Production Scheduling via Explainable AI
Daniel Fischer, Hannah M. Hüsener, Felix Grumbach, Lukas Vollenkemper, Arthur Müller, Pascal Reusch
Minor DPO reject penalty to increase training robustness
Shiming Xie, Hong Chen, Fred Yu, Zeye Sun, Xiuyu Wu, Yingfan Hu
World Models Increase Autonomy in Reinforcement Learning
Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat, Edward S. Hu
Multi-Agent Reinforcement Learning for Autonomous Driving: A Survey
Ruiqi Zhang, Jing Hou, Florian Walter, Shangding Gu, Jiayi Guan, Florian Röhrbein, Yali Du, Panpan Cai, Guang Chen, Alois Knoll
CusADi: A GPU Parallelization Framework for Symbolic Expressions and Optimal Control
Se Hwan Jeon, Seungwoo Hong, Ho Jae Lee, Charles Khazoom, Sangbae Kim
Enhancing Quantum Memory Lifetime with Measurement-Free Local Error Correction and Reinforcement Learning
Mincheol Park, Nishad Maskara, Marcin Kalinowski, Mikhail D. Lukin
Directed Exploration in Reinforcement Learning from Linear Temporal Logic
Marco Bagatella, Andreas Krause, Georg Martius
Ancestral Reinforcement Learning: Unifying Zeroth-Order Optimization and Genetic Algorithms for Reinforcement Learning
So Nakashima, Tetsuya J. Kobayashi
REFINE-LM: Mitigating Language Model Stereotypes via Reinforcement Learning
Rameez Qureshi, Naïm Es-Sebbani, Luis Galárraga, Yvette Graham, Miguel Couceiro, Zied Bouraoui
Reward Difference Optimization For Sample Reweighting In Offline RLHF
Shiqi Wang, Zhengze Zhang, Rui Zhao, Fei Tan, Cam Tu Nguyen
CAT: Caution Aware Transfer in Reinforcement Learning via Distributional Risk
Mohamad Fares El Hajj Chehade, Amrit Singh Bedi, Amy Zhang, Hao Zhu
SYMPOL: Symbolic Tree-Based On-Policy Reinforcement Learning
Sascha Marton, Tim Grams, Florian Vogt, Stefan Lüdtke, Christian Bartelt, Heiner Stuckenschmidt
Efficient Multi-Policy Evaluation for Reinforcement Learning
Shuze Daniel Liu, Claire Chen, Shangtong Zhang