Reinforcement Learning
Reinforcement learning (RL) focuses on training agents to make optimal decisions in an environment by learning through trial and error, aiming to maximize cumulative rewards. Current research emphasizes improving RL's efficiency and robustness, particularly in areas like human-in-the-loop training (e.g., using human feedback to refine models), handling uncertainty and sparse rewards, and scaling to complex tasks (e.g., robotics, autonomous driving). Prominent approaches involve various policy gradient methods, Monte Carlo Tree Search, and the integration of large language models for improved decision-making and task decomposition. These advancements are driving progress in diverse fields, including robotics, game playing, and the development of more human-aligned AI systems.
Papers
Finite-Sample Analysis of the Monte Carlo Exploring Starts Algorithm for Reinforcement Learning
Suei-Wen Chen, Keith Ross, Pierre Youssef
Solving Reach-Avoid-Stay Problems Using Deep Deterministic Policy Gradients
Gabriel Chenevert, Jingqi Li, Achyuta kannan, Sangjae Bae, Donggun Lee
ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AI
Ahmad Elawady, Gunjan Chhablani, Ram Ramrakhya, Karmesh Yadav, Dhruv Batra, Zsolt Kira, Andrew Szot
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
Yekun Chai, Haoran Sun, Huang Fang, Shuohuan Wang, Yu Sun, Hua Wu
Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration
Yun Qu, Boyuan Wang, Yuhang Jiang, Jianzhun Shao, Yixiu Mao, Cheems Wang, Chang Liu, Xiangyang Ji
Dual Active Learning for Reinforcement Learning from Human Feedback
Pangpang Liu, Chengchun Shi, Will Wei Sun
Cross-Embodiment Dexterous Grasping with Reinforcement Learning
Haoqi Yuan, Bohan Zhou, Yuhui Fu, Zongqing Lu
End-to-end Driving in High-Interaction Traffic Scenarios with Reinforcement Learning
Yueyuan Li, Mingyang Jiang, Songan Zhang, Wei Yuan, Chunxiang Wang, Ming Yang
Doubly Optimal Policy Evaluation for Reinforcement Learning
Shuze Liu, Claire Chen, Shangtong Zhang
Abstract Reward Processes: Leveraging State Abstraction for Consistent Off-Policy Evaluation
Shreyas Chaudhari, Ameet Deshpande, Bruno Castro da Silva, Philip S. Thomas
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
Jonas Gehring, Kunhao Zheng, Jade Copet, Vegard Mella, Taco Cohen, Gabriel Synnaeve
Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL
Ghada Sokar, Johan Obando-Ceron, Aaron Courville, Hugo Larochelle, Pablo Samuel Castro
LLM-Augmented Symbolic Reinforcement Learning with Landmark-Based Task Decomposition
Alireza Kheirandish, Duo Xu, Faramarz Fekri
PreND: Enhancing Intrinsic Motivation in Reinforcement Learning through Pre-trained Network Distillation
Mohammadamin Davoodabadi, Negin Hashemi Dijujin, Mahdieh Soleymani Baghshah
Mimicking Human Intuition: Cognitive Belief-Driven Q-Learning
Xingrui Gu, Guanren Qiao, Chuyi Jiang, Tianqing Xia, Hangyu Mao
MOREL: Enhancing Adversarial Robustness through Multi-Objective Representation Learning
Sedjro Salomon Hotegni, Sebastian Peitz
Stable Offline Value Function Learning with Bisimulation-based Representations
Brahma S. Pavse, Yudong Chen, Qiaomin Xie, Josiah P. Hanna
From Reward Shaping to Q-Shaping: Achieving Unbiased Learning with LLM-Guided Knowledge
Xiefeng Wu