Reinforcement Learning
Reinforcement learning (RL) focuses on training agents to make optimal decisions in an environment by learning through trial and error, aiming to maximize cumulative rewards. Current research emphasizes improving RL's efficiency and robustness, particularly in areas like human-in-the-loop training (e.g., using human feedback to refine models), handling uncertainty and sparse rewards, and scaling to complex tasks (e.g., robotics, autonomous driving). Prominent approaches involve various policy gradient methods, Monte Carlo Tree Search, and the integration of large language models for improved decision-making and task decomposition. These advancements are driving progress in diverse fields, including robotics, game playing, and the development of more human-aligned AI systems.
Papers
Physics-model-guided Worst-case Sampling for Safe Reinforcement Learning
Hongpeng Cao, Yanbing Mao, Lui Sha, Marco Caccamo
Multi-Task Reinforcement Learning for Quadrotors
Jiaxu Xing, Ismail Geles, Yunlong Song, Elie Aljalbout, Davide Scaramuzza
CLIP-RLDrive: Human-Aligned Autonomous Driving via CLIP-Based Reward Shaping in Reinforcement Learning
Erfan Doroudian, Hamid Taghavifar
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Bhavya Sukhija, Stelian Coros, Andreas Krause, Pieter Abbeel, Carmelo Sferrazza
Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
Eliot Xing, Vernon Luk, Jean Oh
Equivariant Action Sampling for Reinforcement Learning and Planning
Linfeng Zhao, Owen Howell, Xupeng Zhu, Jung Yeon Park, Zhewen Zhang, Robin Walters, Lawson L.S. Wong
Generalized Bayesian deep reinforcement learning
Shreya Sinha Roy, Richard G. Everitt, Christian P. Robert, Ritabrata Dutta
Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents
Wonje Choi, Woo Kyung Kim, SeungHyun Kim, Honguk Woo
RL-LLM-DT: An Automatic Decision Tree Generation Method Based on RL Evaluation and LLM Enhancement
Junjie Lin, Jian Zhao, Yue Deng, Youpeng Zhao, Wengang Zhou, Houqiang Li
Hybrid Preference Optimization for Alignment: Provably Faster Convergence Rates by Combining Offline Preferences with Online Exploration
Avinandan Bose, Zhihan Xiong, Aadirupa Saha, Simon Shaolei Du, Maryam Fazel
Solving the Inverse Alignment Problem for Efficient RLHF
Shambhavi Krishna, Aishwarya Sahoo
Physics Instrument Design with Reinforcement Learning
Shah Rukh Qasim, Patrick Owen, Nicola Serra
AMUSE: Adaptive Model Updating using a Simulated Environment
Louis Chislett, Catalina A. Vallejos, Timothy I. Cannings, James Liley
Reward Machine Inference for Robotic Manipulation
Mattijs Baert, Sam Leroux, Pieter Simoens
Optimized Coordination Strategy for Multi-Aerospace Systems in Pick-and-Place Tasks By Deep Neural Network
Ye Zhang, Linyue Chu, Letian Xu, Kangtong Mo, Zhengjian Kang, Xingyu Zhang
RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning
Charles Xu, Qiyang Li, Jianlan Luo, Sergey Levine