Reinforcement Learning
Reinforcement learning (RL) focuses on training agents to make optimal decisions in an environment by learning through trial and error, aiming to maximize cumulative rewards. Current research emphasizes improving RL's efficiency and robustness, particularly in areas like human-in-the-loop training (e.g., using human feedback to refine models), handling uncertainty and sparse rewards, and scaling to complex tasks (e.g., robotics, autonomous driving). Prominent approaches involve various policy gradient methods, Monte Carlo Tree Search, and the integration of large language models for improved decision-making and task decomposition. These advancements are driving progress in diverse fields, including robotics, game playing, and the development of more human-aligned AI systems.
Papers
Statistical Inference for Temporal Difference Learning with Linear Function Approximation
Weichen Wu, Gen Li, Yuting Wei, Alessandro Rinaldo
Information-Theoretic Minimax Regret Bounds for Reinforcement Learning based on Duality
Raghav Bongole, Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, Mikael Skoglund
Learning Quadrotor Control From Visual Features Using Differentiable Simulation
Johannes Heeg, Yunlong Song, Davide Scaramuzza
Understanding and Alleviating Memory Consumption in RLHF for LLMs
Jin Zhou, Hanmei Yang, Steven (Jiaxun) Tang, Mingcan Xiang, Hui Guan, Tongping Liu
On The Global Convergence Of Online RLHF With Neural Parametrization
Mudit Gaur, Amrit Singh Bedi, Raghu Pasupathy, Vaneet Aggarwal
Reinforced Imitative Trajectory Planning for Urban Automated Driving
Di Zeng, Ling Zheng, Xiantong Yang, Yinong Li
A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications
Wenyi Xiao, Zechuan Wang, Leilei Gan, Shuai Zhao, Wanggui He, Luu Anh Tuan, Long Chen, Hao Jiang, Zhou Zhao, Fei Wu
A Plug-and-Play Fully On-the-Job Real-Time Reinforcement Learning Algorithm for a Direct-Drive Tandem-Wing Experiment Platforms Under Multiple Random Operating Conditions
Zhang Minghao, Song Bifeng, Yang Xiaojun, Wang Liang
IntersectionZoo: Eco-driving for Benchmarking Multi-Agent Contextual Reinforcement Learning
Vindula Jayawardana, Baptiste Freydt, Ao Qu, Cameron Hickert, Zhongxia Yan, Cathy Wu
Action abstractions for amortized sampling
Oussama Boussif, Léna Néhale Ezzine, Joseph D Viviano, Michał Koziarski, Moksh Jain, Nikolay Malkin, Emmanuel Bengio, Rim Assouel, Yoshua Bengio
GUIDE: Real-Time Human-Shaped Agents
Lingyu Zhang, Zhengran Ji, Nicholas R Waytowich, Boyuan Chen
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
Taiyi Wang, Zhihao Wu, Jianheng Liu, Jianye Hao, Jun Wang, Kun Shao
A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning
Shengjie Sun, Runze Liu, Jiafei Lyu, Jing-Wen Yang, Liangpeng Zhang, Xiu Li
Harnessing Causality in Reinforcement Learning With Bagged Decision Times
Daiqi Gao, Hsin-Yu Lai, Predrag Klasnja, Susan A. Murphy
Streaming Deep Reinforcement Learning Finally Works
Mohamed Elsayed, Gautham Vasan, A. Rupam Mahmood
DRL Optimization Trajectory Generation via Wireless Network Intent-Guided Diffusion Models for Optimizing Resource Allocation
Junjie Wu, Xuming Fang, Dusit Niyato, Jiacheng Wang, Jingyu Wang
Knowledge Transfer from Simple to Complex: A Safe and Efficient Reinforcement Learning Framework for Autonomous Driving Decision-Making
Rongliang Zhou, Jiakun Huang, Mingjun Li, Hepeng Li, Haotian Cao, Xiaolin Song
MARLIN: Multi-Agent Reinforcement Learning Guided by Language-Based Inter-Robot Negotiation
Toby Godfrey, William Hunt, Mohammad D. Soorati
Interpretable end-to-end Neurosymbolic Reinforcement Learning agents
Nils Grandien, Quentin Delfosse, Kristian Kersting