Reinforcement Learning
Reinforcement learning (RL) focuses on training agents to make optimal decisions in an environment by learning through trial and error, aiming to maximize cumulative rewards. Current research emphasizes improving RL's efficiency and robustness, particularly in areas like human-in-the-loop training (e.g., using human feedback to refine models), handling uncertainty and sparse rewards, and scaling to complex tasks (e.g., robotics, autonomous driving). Prominent approaches involve various policy gradient methods, Monte Carlo Tree Search, and the integration of large language models for improved decision-making and task decomposition. These advancements are driving progress in diverse fields, including robotics, game playing, and the development of more human-aligned AI systems.
Papers - Page 4
Optimal Path Planning and Cost Minimization for a Drone Delivery System Via Model Predictive Control
Muhammad Al-Zafar Khan, Jamal Al-KarakiZayed University●The Hashemite UniversityRisk-Aware Reinforcement Learning for Autonomous Driving: Improving Safety When Driving through Intersection
Bo Leng, Ran Yu, Wei Han, Lu Xiong, Zhuoren Li, Hailong HuangTongji University●the Hong Kong Polytechnic UniversityLearning to chain-of-thought with Jensen's evidence lower bound
Yunhao Tang, Sid Wang, Rémi MunosMeta GenAI●Meta FAIROptimizing Language Models for Inference Time Objectives using Reinforcement Learning
Yunhao Tang, Kunhao Zheng, Gabriel Synnaeve, Rémi MunosMeta GenAI●Meta FAIROne Framework to Rule Them All: Unifying RL-Based and RL-Free Methods in RLHF
Xin CaiIndependent ResearcherReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
Mingyang Chen, Tianpeng Li, Haoze Sun, Yijie Zhou, Chenzheng Zhu, Fan Yang, Zenan Zhou, Weipeng Chen, Haofen Wang, Jeff Z. Pan, Wen Zhang+1Baichuan Inc.●Tongji University●The University of Edinburgh●Zhejiang University
Continual Reinforcement Learning for HVAC Systems Control: Integrating Hypernetworks and Transfer Learning
Gautham Udayakumar Bekal, Ahmed Ghareeb, Ashish PujariEnlyte●University of Kirkuk●University of North Carolina at CharlotteA Shared Low-Rank Adaptation Approach to Personalized RLHF
Renpu Liu, Peng Wang, Donghao Li, Cong Shen, Jing YangUniversity of VirginiaTrajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training
Brian R. Bartoldson, Siddarth Venkatraman, James Diffenderfer, Moksh Jain, Tal Ben-Nun, Seanie Lee, Minsu Kim, Johan Obando-Ceron+2Lawrence Livermore National Laboratory●Mila – Quebec AI Institute●Universit´e de Montr´eal●KAIST●CIFAR FellowOption Discovery Using LLM-guided Semantic Hierarchical Reinforcement Learning
Chak Lam Shek, Pratap TokekarUniversity of MarylandSample-Efficient Reinforcement Learning of Koopman eNMPC
Daniel Mayfrank, Mehmet Velioglu, Alexander Mitsos, Manuel DahmenForschungszentrum Jülich GmbH●RWTH Aachen University●JARA-ENERGYAED: Automatic Discovery of Effective and Diverse Vulnerabilities for Autonomous Driving Policy with Large Language Models
Le Qiu, Zelai Xu, Qixin Tan, Wenhao Tang, Chao Yu, Yu WangTsinghua University●Beijing Zhongguancun AcademySimulation-Driven Balancing of Competitive Game Levels with Reinforcement Learning
Florian Rupp, Manuel Eberhardinger, Kai EckertFF-SRL: High Performance GPU-Based Surgical Simulation For Robot Learning
Diego Dall'Alba, Michał Nasket, Sabina Kaminska, Przemysław KorzeniowskiSano Centre for Computational Medicine●University of VeronaReinforcement Learning in Switching Non-Stationary Markov Decision Processes: Algorithms and Convergence Analysis
Mohsen Amiri, Sindri MagnússonStockholm UniversityTeaching LLMs for Step-Level Automatic Math Correction via Reinforcement Learning
Junsong Li, Jie Zhou, Yutao Yang, Bihao Zhan, Qianjun Pan, Yuyang Ding, Qin Chen, Jiang Bo, Xin Lin, Liang HeEast China Normal UniversityFinite-Time Bounds for Two-Time-Scale Stochastic Approximation with Arbitrary Norm Contractions and Markovian Noise
Siddharth Chandak, Shaan Ul Haque, Nicholas BambosStanford University●Georgia Institute of TechnologyAgent-based Modeling meets the Capability Approach for Human Development: Simulating Homelessness Policy-making
Alba Aguilera, Nardine Osman, Georgina CurtoIIIA-CSIC●United Nations University Institute in MacauReinforcement Learning for Adaptive Planner Parameter Tuning: A Perspective on Hierarchical Architecture
Lu Wangtao, Wei Yufei, Xu Jiadong, Jia Wenhao, Li Liang, Xiong Rong, Wang YueZhejiang University●Zhejiang University of TechnologyLatent Embedding Adaptation for Human Preference Alignment in Diffusion Planners
Wen Zheng Terence Ng, Jianda Chen, Yuan Xu, Tianwei ZhangNanyang Technological University●Continental Automotive Singapore