Reinforcement Learning
Reinforcement learning (RL) focuses on training agents to make optimal decisions in an environment by learning through trial and error, aiming to maximize cumulative rewards. Current research emphasizes improving RL's efficiency and robustness, particularly in areas like human-in-the-loop training (e.g., using human feedback to refine models), handling uncertainty and sparse rewards, and scaling to complex tasks (e.g., robotics, autonomous driving). Prominent approaches involve various policy gradient methods, Monte Carlo Tree Search, and the integration of large language models for improved decision-making and task decomposition. These advancements are driving progress in diverse fields, including robotics, game playing, and the development of more human-aligned AI systems.
Papers
AutoGLM: Autonomous Foundation Agents for GUIs
Xiao Liu, Bo Qin, Dongzhu Liang, Guang Dong, Hanyu Lai, Hanchen Zhang, Hanlin Zhao, Iat Long Iong, Jiadai Sun, Jiaqi Wang, Junjie Gao, Junjun Shan, Kangning Liu, Shudan Zhang, Shuntian Yao, Siyi Cheng, Wentao Yao, Wenyi Zhao, Xinghan Liu, Xinyi Liu, Xinying Chen, Xinyue Yang, Yang Yang, Yifan Xu, Yu Yang, Yujia Wang, Yulin Xu, Zehan Qi, Yuxiao Dong, Jie Tang
Getting By Goal Misgeneralization With a Little Help From a Mentor
Tu Trinh, Mohamad H. Danesh, Nguyen X. Khanh, Benjamin Plaut
FairStream: Fair Multimedia Streaming Benchmark for Reinforcement Learning Agents
Jannis Weil, Jonas Ringsdorf, Julian Barthel, Yi-Ping Phoebe Chen, Tobias Meuser
Reference-Free Formula Drift with Reinforcement Learning: From Driving Data to Tire Energy-Inspired, Real-World Policies
Franck Djeumou, Michael Thompson, Makoto Suminaka, John Subosits
Constrained Optimal Fuel Consumption of HEV:Considering the Observational Perturbation
Shuchang Yan, Haoran Sun
Reward Modeling with Weak Supervision for Language Models
Ben Hauptvogel, Malte Ostendorff, Georg Rehm, Sebastian Möller
Adversarial Constrained Policy Optimization: Improving Constrained Reinforcement Learning by Adapting Budgets
Jianmina Ma, Jingtian Ji, Yue Gao
Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL
Andrew Wagenmaker, Kevin Huang, Liyiming Ke, Byron Boots, Kevin Jamieson, Abhishek Gupta
Beyond Simple Sum of Delayed Rewards: Non-Markovian Reward Modeling for Reinforcement Learning
Yuting Tang, Xin-Qiang Cai, Jing-Cheng Pang, Qiyu Wu, Yao-Xiang Ding, Masashi Sugiyama
GFlowNet Fine-tuning for Diverse Correct Solutions in Mathematical Reasoning Tasks
Ryoichi Takase, Masaya Tsunokake, Yuta Tsuchiya, Shota Inuzuka
OGBench: Benchmarking Offline Goal-Conditioned RL
Seohong Park, Kevin Frans, Benjamin Eysenbach, Sergey Levine
On-Robot Reinforcement Learning with Goal-Contrastive Rewards
Ondrej Biza, Thomas Weng, Lingfeng Sun, Karl Schmeckpeper, Tarik Kelestemur, Yecheng Jason Ma, Robert Platt, Jan-Willem van de Meent, Lawson L. S. Wong
SAD: State-Action Distillation for In-Context Reinforcement Learning under Random Policies
Weiqin Chen, Santiago Paternain
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Xiyue Peng, Hengquan Guo, Jiawei Zhang, Dongqing Zou, Ziyu Shao, Honghao Wei, Xin Liu
Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting
Mohamed Salim Aissi, Clement Romac, Thomas Carta, Sylvain Lamprier, Pierre-Yves Oudeyer, Olivier Sigaud, Laure Soulier, Nicolas Thome
DA-VIL: Adaptive Dual-Arm Manipulation with Reinforcement Learning and Variable Impedance Control
Md Faizal Karim, Shreya Bollimuntha, Mohammed Saad Hashmi, Autrio Das, Gaurav Singh, Srinath Sridhar, Arun Kumar Singh, Nagamanikandan Govindan, K Madhava Krishna
MILES: Making Imitation Learning Easy with Self-Supervision
Georgios Papagiannis, Edward Johns
Robotic Learning in your Backyard: A Neural Simulator from Open Source Components
Liyou Zhou, Oleg Sinavski, Athanasios Polydoros
AgentForge: A Flexible Low-Code Platform for Reinforcement Learning Agent Design
Francisco Erivaldo Fernandes Junior, Antti Oulasvirta