Expectation Alignment
Expectation alignment in artificial intelligence focuses on aligning the behavior of AI agents with the actual expectations of their human users, addressing the problem of reward misspecification and unintended consequences. Current research explores methods for inferring user expectations, often using frameworks like theory of mind and incorporating multiple evaluation metrics beyond simple reward maximization, employing techniques such as linear programming and stochastic dominance. This work is crucial for improving AI safety and reliability, leading to more trustworthy and beneficial AI systems across various applications, from robotics and conversational agents to decision-making systems.
Papers
Beyond expectations: Residual Dynamic Mode Decomposition and Variance for Stochastic Dynamical Systems
Matthew J. Colbrook, Qin Li, Ryan V. Raut, Alex Townsend
X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events
Bo Dai, Linge Wang, Baoxiong Jia, Zeyu Zhang, Song-Chun Zhu, Chi Zhang, Yixin Zhu