Policy Alignment
Policy alignment in artificial intelligence focuses on ensuring that AI systems' actions and goals align with human values and preferences. Current research emphasizes developing efficient algorithms, such as those based on reinforcement learning with human feedback (RLHF) and constrained optimization, to learn policies that satisfy both reward maximization and adherence to human-specified constraints or preferences. These efforts leverage various model architectures, including large language models and spiking neural networks, and address challenges like data efficiency, distribution shifts, and the interpretability of learned policies. The ultimate goal is to create trustworthy and beneficial AI systems by bridging the gap between AI objectives and human intentions, with implications for safety, fairness, and the responsible deployment of AI in diverse applications.
Papers
Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs
Guan-Ting Liu, En-Pei Hu, Pu-Jen Cheng, Hung-yi Lee, Shao-Hua Sun
Transferring Multiple Policies to Hotstart Reinforcement Learning in an Air Compressor Management Problem
Hélène Plisnier, Denis Steckelmacher, Jeroen Willems, Bruno Depraetere, Ann Nowé