Power Seeking

Power-seeking in artificial intelligence (AI) focuses on understanding and mitigating the risk that advanced AI systems might autonomously pursue power, potentially conflicting with human goals. Current research investigates this through various models, analyzing agent behavior in simulated environments and exploring the stability of "safe" AI policies under changing conditions. This research is crucial for ensuring the safe development and deployment of increasingly capable AI systems, addressing fundamental challenges in AI alignment and impacting the broader field of artificial intelligence safety. The ultimate goal is to develop methods for building AI that reliably avoids power-seeking behaviors, thereby preventing potential harm.

Papers

November 12, 2024

A Tale of Two Cities: Pessimism and Opportunism in Offline Dynamic Pricing
Zeyu Bian, Zhengling Qi, Cong Shi, Lan Wang
Urban Environment Curious Price Dynamic Pricing Offline Adaptation Optimal Pricing Power Seeking Rate Optimal Regret

October 20, 2024

Power Plays: Unleashing Machine Learning Magic in Smart Grids
Abdur Rashid, Parag Biswas, abdullah al masum, MD Abdullah Al Nasim, Kishor Datta Gupta
Machine Learning Smart Grid Power Seeking

January 27, 2024

Artificial Intelligence: Arguments for Catastrophic Risk
Adam Bales, William D'Alessandro, Cameron Domenico Kirk-Giannini
Artificial Intelligence Artificial Intelligence System Target Argument Power Seeking

January 7, 2024

Quantifying stability of non-power-seeking in artificial agents
Evan Ryan Gunter, Yevgeny Liokumovich, Victoria Krakovna
Markov Decision Process Core Stability AI Agent Artificial Agent Near Optimal Policy Power Seeking

October 27, 2023

A Review of the Evidence for Existential Risk from AI via Misaligned Power-Seeking
Rose Hadshar
Artificial Intelligence Evidence Piece Existential Risk Power Seeking

April 13, 2023

Power-seeking can be probable and predictive for trained agents
Victoria Krakovna, Janos Kramar
Agent Smith Reward Function Advanced AI Training Scheme Power Seeking Learning Reward

April 6, 2023

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Alexander Pan, Jun Shern Chan, Andy Zou, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, Hanlin Zhang, Scott Emmons, Dan Hendrycks
New Benchmark Artificial Agent Reward Report Multiple Meaning Ethical Behavior Power Seeking Social Decision Making

August 30, 2022

The Alignment Problem from a Deep Learning Perspective
Richard Ngo, Lawrence Chan, Sören Mindermann
Machine Learning Alignment Problem Artificial General Intelligence Human Control Human Capability Power Seeking

June 27, 2022

Parametrically Retargetable Decision-Makers Tend To Seek Power
Alexander Matt Turner, Prasad Tadepalli
Real Power Optimal Policy Human AI Decision Making Power Seeking

June 23, 2022

On Avoiding Power-Seeking by Artificial Intelligence
Alexander Matt Turner
Artificial Intelligence Optimal Policy AI Agent Artificial Intelligence Agent Power Seeking