Expectation Alignment

Expectation alignment in artificial intelligence focuses on aligning the behavior of AI agents with the actual expectations of their human users, addressing the problem of reward misspecification and unintended consequences. Current research explores methods for inferring user expectations, often using frameworks like theory of mind and incorporating multiple evaluation metrics beyond simple reward maximization, employing techniques such as linear programming and stochastic dominance. This work is crucial for improving AI safety and reliability, leading to more trustworthy and beneficial AI systems across various applications, from robotics and conversational agents to decision-making systems.

Papers