Reinforcement Learning
Reinforcement learning (RL) focuses on training agents to make optimal decisions in an environment by learning through trial and error, aiming to maximize cumulative rewards. Current research emphasizes improving RL's efficiency and robustness, particularly in areas like human-in-the-loop training (e.g., using human feedback to refine models), handling uncertainty and sparse rewards, and scaling to complex tasks (e.g., robotics, autonomous driving). Prominent approaches involve various policy gradient methods, Monte Carlo Tree Search, and the integration of large language models for improved decision-making and task decomposition. These advancements are driving progress in diverse fields, including robotics, game playing, and the development of more human-aligned AI systems.
Papers
Pre-trained Visual Dynamics Representations for Efficient Policy Learning
Hao Luo, Bohan Zhou, Zongqing Lu
Autonomous Decision Making for UAV Cooperative Pursuit-Evasion Game with Reinforcement Learning
Yang Zhao, Zidong Nie, Kangsheng Dong, Qinghua Huang, Xuelong Li
Embedding Safety into RL: A New Take on Trust Region Methods
Nikola Milosevic, Johannes Müller, Nico Scherf
When to Localize? A Risk-Constrained Reinforcement Learning Approach
Chak Lam Shek, Kasra Torshizi, Troi Williams, Pratap Tokekar
Learning to Assist Humans without Inferring Rewards
Vivek Myers, Evan Ellis, Sergey Levine, Benjamin Eysenbach, Anca Dragan
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Zehan Qi, Xiao Liu, Iat Long Iong, Hanyu Lai, Xueqiao Sun, Wenyi Zhao, Yu Yang, Xinyue Yang, Jiadai Sun, Shuntian Yao, Tianjie Zhang, Wei Xu, Jie Tang, Yuxiao Dong
On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams, Micah Carroll, Adhyyan Narang, Constantin Weisser, Brendan Murphy, Anca Dragan
N-Gram Induction Heads for In-Context RL: Improving Stability and Reducing Data Needs
Ilya Zisman, Alexander Nikulin, Andrei Polubarov, Nikita Lyubaykin, Vladislav Kurenkov
ManiBox: Enhancing Spatial Grasping Generalization via Scalable Simulation Data Generation
Hengkai Tan, Xuezhou Xu, Chengyang Ying, Xinyi Mao, Songming Liu, Xingxing Zhang, Hang Su, Jun Zhu
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback
Guan-Ting Lin, Prashanth Gurunath Shivakumar, Aditya Gourav, Yile Gu, Ankur Gandhe, Hung-yi Lee, Ivan Bulyko
So You Think You Can Scale Up Autonomous Robot Data Collection?
Suvir Mirchandani, Suneel Belkhale, Joey Hejna, Evelyn Choi, Md Sazzad Islam, Dorsa Sadigh
SALSA: Soup-based Alignment Learning for Stronger Adaptation in RLHF
Atoosa Chegini, Hamid Kazemi, Iman Mirzadeh, Dong Yin, Maxwell Horton, Moin Nabi, Mehrdad Farajtabar, Keivan Alizadeh
Show, Don't Tell: Learning Reward Machines from Demonstrations for Reinforcement Learning-Based Cardiac Pacemaker Synthesis
John Komp, Dananjay Srinivas, Maria Pacheco, Ashutosh Trivedi
Diversity Progress for Goal Selection in Discriminability-Motivated RL
Erik M. Lintunen, Nadia M. Ady, Christian Guckelsberger
Teaching Models to Improve on Tape
Liat Bezalel, Eyal Orgad, Amir Globerson
Learning Hidden Subgoals under Temporal Ordering Constraints in Reinforcement Learning
Duo Xu, Faramarz Fekri
Learning World Models for Unconstrained Goal Navigation
Yuanlin Duan, Wensen Mao, He Zhu
Exploring the Edges of Latent State Clusters for Goal-Conditioned Reinforcement Learning
Yuanlin Duan, Guofeng Cui, He Zhu