Policy Alignment

Policy alignment in artificial intelligence focuses on ensuring that AI systems' actions and goals align with human values and preferences. Current research emphasizes developing efficient algorithms, such as those based on reinforcement learning with human feedback (RLHF) and constrained optimization, to learn policies that satisfy both reward maximization and adherence to human-specified constraints or preferences. These efforts leverage various model architectures, including large language models and spiking neural networks, and address challenges like data efficiency, distribution shifts, and the interpretability of learned policies. The ultimate goal is to create trustworthy and beneficial AI systems by bridging the gap between AI objectives and human intentions, with implications for safety, fairness, and the responsible deployment of AI in diverse applications.

Papers

June 1, 2022

(Machine) Learning What Policies Value
Daniel Björkegren, Joshua E. Blumenstock, Samsun Knight
New Machine Heterogeneous Treatment Effect Capital Allocation Policy Alignment Welfare Maximization Allocation Mechanism Poverty Reduction

April 20, 2022

Reinforcement Learning with Intrinsic Affinity for Personalized Prosperity Management
Charl Maree, Christian W. Omlin
Reinforcement Learning Optimal Strategy Balancing Efficiency Policy Alignment Personalized Incentive

January 27, 2022

Human-centered mechanism design with Democratic AI
Raphael Koster, Jan Balaguer, Andrea Tacchetti, Ari Weinstein, Tina Zhu, Oliver Hauser, Duncan Williams, Lucy Campbell-Gillingham, Phoebe Thacker, Matthew Botvinick, Christopher Summerfield
Reinforcement Learning Artificial Intelligence Mechanism Design Policy Alignment

January 24, 2022

Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity
Yan Li, Guanghui Lan, Tuo Zhao
Policy Gradient Sample Complexity Implicit Regularization Infinite Horizon Policy Mirror Descent Policy Alignment Policy Mirror

Policy Alignment

Papers

(Machine) Learning What Policies Value

Reinforcement Learning with Intrinsic Affinity for Personalized Prosperity Management

Human-centered mechanism design with Democratic AI

Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity