Reinforcement Learning
Reinforcement learning (RL) focuses on training agents to make optimal decisions in an environment by learning through trial and error, aiming to maximize cumulative rewards. Current research emphasizes improving RL's efficiency and robustness, particularly in areas like human-in-the-loop training (e.g., using human feedback to refine models), handling uncertainty and sparse rewards, and scaling to complex tasks (e.g., robotics, autonomous driving). Prominent approaches involve various policy gradient methods, Monte Carlo Tree Search, and the integration of large language models for improved decision-making and task decomposition. These advancements are driving progress in diverse fields, including robotics, game playing, and the development of more human-aligned AI systems.
Papers
SePPO: Semi-Policy Preference Optimization for Diffusion Alignment
Daoan Zhang, Guangchen Lan, Dong-Jun Han, Wenlin Yao, Xiaoman Pan, Hongming Zhang, Mingxiao Li, Pengcheng Chen, Yu Dong, Christopher Brinton, Jiebo Luo
LLMs Are In-Context Reinforcement Learners
Giovanni Monea, Antoine Bosselut, Kianté Brantley, Yoav Artzi
Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning
Ayano Hiranaka, Shang-Fu Chen, Chieh-Hsin Lai, Dongjun Kim, Naoki Murata, Takashi Shibuya, Wei-Hsiang Liao, Shao-Hua Sun, Yuki Mitsufuji
AlphaRouter: Quantum Circuit Routing with Reinforcement Learning and Tree Search
Wei Tang, Yiheng Duan, Yaroslav Kharkov, Rasool Fakoor, Eric Kessler, Yunong Shi
Reinforcement Learning Control for Autonomous Hydraulic Material Handling Machines with Underactuated Tools
Filippo A. Spinelli, Pascal Egli, Julian Nubert, Fang Nan, Thilo Bleumer, Patrick Goegler, Stephan Brockes, Ferdinand Hofmann, Marco Hutter
Towards using Reinforcement Learning for Scaling and Data Replication in Cloud Systems
Riad Mokadem (IRIT-PYRAMIDE), Fahem Arar (IRIT-PYRAMIDE, ESI), Djamel Eddine Zegour
Mastering Chinese Chess AI (Xiangqi) Without Search
Yu Chen, Juntong Lin, Zhichao Shu
Domains as Objectives: Domain-Uncertainty-Aware Policy Optimization through Explicit Multi-Domain Convex Coverage Set Learning
Wendyam Eric Lionel Ilboudo, Taisuke Kobayashi, Takamitsu Matsubara
Towards Measuring Goal-Directedness in AI Systems
Dylan Xu, Juan-Pablo Rivera
DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications
Mathias Jackermeier, Alessandro Abate
GreenLight-Gym: A Reinforcement Learning Benchmark Environment for Greenhouse Crop Production Control
Bart van Laatum, Eldert J. van Henten, Sjoerd Boersma
AdaMemento: Adaptive Memory-Assisted Policy Optimization for Reinforcement Learning
Renye Yan, Yaozhong Gan, You Wu, Junliang Xing, Ling Liangn, Yeshang Zhu, Yimao Cai
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
Alex Cloud, Jacob Goldman-Wetzler, Evžen Wybitul, Joseph Miller, Alexander Matt Turner
ShieldDiff: Suppressing Sexual Content Generation from Diffusion Models through Reinforcement Learning
Dong Han, Salaheldin Mohamed, Yong Li
Learning Humanoid Locomotion over Challenging Terrain
Ilija Radosavovic, Sarthak Kamat, Trevor Darrell, Jitendra Malik
Training on more Reachable Tasks for Generalisation in Reinforcement Learning
Max Weltevrede, Caroline Horsch, Matthijs T.J. Spaan, Wendelin Böhmer
GAP-RL: Grasps As Points for RL Towards Dynamic Object Grasping
Pengwei Xie, Siang Chen, Qianrun Chen, Wei Tang, Dingchang Hu, Yixiang Dai, Rui Chen, Guijin Wang
Mitigating Adversarial Perturbations for Deep Reinforcement Learning via Vector Quantization
Tung M. Luu, Thanh Nguyen, Tee Joshua Tian Jin, Sungwoon Kim, Chang D. Yoo
SELU: Self-Learning Embodied MLLMs in Unknown Environments
Boyu Li, Haobin Jiang, Ziluo Ding, Xinrun Xu, Haoran Li, Dongbin Zhao, Zongqing Lu