Policy Value
Policy value research focuses on aligning artificial intelligence systems, particularly large language models (LLMs), with human values and societal norms. Current research emphasizes developing robust evaluation frameworks and benchmarks to assess this alignment across diverse contexts, employing techniques like Bayesian inverse reinforcement learning and generative evolving testing, as well as exploring the use of transformer-based models for imputation of missing data in value-related datasets. This work is crucial for mitigating potential harms from AI systems and ensuring responsible development and deployment, impacting fields ranging from news recommendation to healthcare and education.
Papers
Interpretable Generalized Additive Models for Datasets with Missing Values
Hayden McTavish, Jon Donnelly, Margo Seltzer, Cynthia Rudin
Four Guiding Principles for Modeling Causal Domain Knowledge: A Case Study on Brainstorming Approaches for Urban Blight Analysis
Houssam Razouk, Michael Leitner, Roman Kern
ValueScope: Unveiling Implicit Norms and Values via Return Potential Model of Social Interactions
Chan Young Park, Shuyue Stella Li, Hayoung Jung, Svitlana Volkova, Tanushree Mitra, David Jurgens, Yulia Tsvetkov
Reinforcement Learning from Human Feedback: Whose Culture, Whose Values, Whose Perspectives?
Kristian González Barman, Simon Lohse, Henk de Regt