Reinforcement Learning
Reinforcement learning (RL) focuses on training agents to make optimal decisions in an environment by learning through trial and error, aiming to maximize cumulative rewards. Current research emphasizes improving RL's efficiency and robustness, particularly in areas like human-in-the-loop training (e.g., using human feedback to refine models), handling uncertainty and sparse rewards, and scaling to complex tasks (e.g., robotics, autonomous driving). Prominent approaches involve various policy gradient methods, Monte Carlo Tree Search, and the integration of large language models for improved decision-making and task decomposition. These advancements are driving progress in diverse fields, including robotics, game playing, and the development of more human-aligned AI systems.
Papers
Reinforcement Learning from Human Feedback: Whose Culture, Whose Values, Whose Perspectives?
Kristian González Barman, Simon Lohse, Henk de Regt
Text-Aware Diffusion for Policy Learning
Calvin Luo, Mandy He, Zilai Zeng, Chen Sun
Research on Autonomous Robots Navigation based on Reinforcement Learning
Zixiang Wang, Hao Yan, Yining Wang, Zhengjia Xu, Zhuoyue Wang, Zhizhong Wu
To Switch or Not to Switch? Balanced Policy Switching in Offline Reinforcement Learning
Tao Ma, Xuzhi Yang, Zoltan Szabo
Normalization and effective learning rates in reinforcement learning
Clare Lyle, Zeyu Zheng, Khimya Khetarpal, James Martens, Hado van Hasselt, Razvan Pascanu, Will Dabney
Weight Clipping for Deep Continual and Reinforcement Learning
Mohamed Elsayed, Qingfeng Lan, Clare Lyle, A. Rupam Mahmood
DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging
Tzu-Han Lin, Chen-An Li, Hung-yi Lee, Yun-Nung Chen
Residual-MPPI: Online Policy Customization for Continuous Control
Pengcheng Wang, Chenran Li, Catherine Weaver, Kenta Kawamoto, Masayoshi Tomizuka, Chen Tang, Wei Zhan
Benchmarks for Reinforcement Learning with Biased Offline Data and Imperfect Simulators
Ori Linial, Guy Tennenholtz, Uri Shalit
Model-Free Active Exploration in Reinforcement Learning
Alessio Russo, Alexandre Proutiere
DEAR: Disentangled Environment and Agent Representations for Reinforcement Learning without Reconstruction
Ameya Pore, Riccardo Muradore, Diego Dall'Alba
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Yuheng Zhang, Dian Yu, Baolin Peng, Linfeng Song, Ye Tian, Mingyue Huo, Nan Jiang, Haitao Mi, Dong Yu
A Two-stage Reinforcement Learning-based Approach for Multi-entity Task Allocation
Aicheng Gong, Kai Yang, Jiafei Lyu, Xiu Li
PUZZLES: A Benchmark for Neural Algorithmic Reasoning
Benjamin Estermann, Luca A. Lanzendörfer, Yannick Niedermayr, Roger Wattenhofer
Digital Twin-Assisted Data-Driven Optimization for Reliable Edge Caching in Wireless Networks
Zifan Zhang, Yuchen Liu, Zhiyuan Peng, Mingzhe Chen, Dongkuan Xu, Shuguang Cui
External Model Motivated Agents: Reinforcement Learning for Enhanced Environment Sampling
Rishav Bhagat, Jonathan Balloch, Zhiyu Lin, Julia Kim, Mark Riedl
PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators
Kuo-Hao Zeng, Zichen Zhang, Kiana Ehsani, Rose Hendrix, Jordi Salvador, Alvaro Herrasti, Ross Girshick, Aniruddha Kembhavi, Luca Weihs
Applying RLAIF for Code Generation with API-usage in Lightweight LLMs
Sujan Dutta, Sayantan Mahinder, Raviteja Anantha, Bortik Bandyopadhyay
Operator World Models for Reinforcement Learning
Pietro Novelli, Marco Pratticò, Massimiliano Pontil, Carlo Ciliberto
3D Operation of Autonomous Excavator based on Reinforcement Learning through Independent Reward for Individual Joints
Yoonkyu Yoo, Donghwi Jung, Seong-Woo Kim