Potential Harm

Research on potential harms from artificial intelligence (AI) systems, particularly large language models (LLMs), focuses on identifying and mitigating biases, inaccuracies, and vulnerabilities that lead to negative societal impacts. Current efforts utilize various techniques, including human-centered evaluations, post-hoc model correction methods, and the development of new datasets and annotation frameworks to better understand and categorize different types of harm. This research is crucial for ensuring responsible AI development and deployment, addressing issues ranging from algorithmic bias and misinformation to safety concerns in high-stakes applications like healthcare and law enforcement.

Papers

May 24, 2023

May 19, 2023

Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews
Hye Sun Yun, Iain J. Marshall, Thomas A. Trikalinos, Byron C. Wallace
Large Language Model Language Model Systematic Review Literature Review Potential Harm Potential Application

November 27, 2022

Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of Foundation Models
Peter Henderson, Eric Mitchell, Christopher D. Manning, Dan Jurafsky, Chelsea Finn
Foundation Model Adversarial Learning Hidden CoST Machine Learning System Potential Harm Model Collapse

November 2, 2022

How Technology Impacts and Compares to Humans in Socially Consequential Arenas
Samuel Dooley
Comparative Study Real Human Impact Assessment Potential Harm Comparative Review New Technology Consequential Decision Making

October 14, 2022

Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey
Sachin Kumar, Vidhisha Balachandran, Lucille Njoo, Antonios Anastasopoulos, Yulia Tsvetkov
Large Language Model Language Model Language Generation Human Like Potential Harm Language Generation Model Actionable Part

October 11, 2022

A Causal Analysis of Harm
Sander Beckers, Hana Chockler, Joseph Y. Halpern
Causal Model Causal Pattern Causal Analysis Potential Harm Actual Causality

September 29, 2022

Quantifying Harm
Sander Beckers, Hana Chockler, Joseph Y. Halpern
High Uncertainty Anticipation Potential Harm Qualitative Reasoning

September 7, 2022

Regulating eXplainable Artificial Intelligence (XAI) May Harm Consumers
Behnam Mohammadi, Nikhil Malik, Tim Derdenger, Kannan Srinivasan
Explainable AI xAI Community eXplainable Artificial Intelligence Potential Harm AI Fairness XAI System Artificial Intelligence Decision

August 23, 2022

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, Andy Jones, Sam Bowman, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Nelson Elhage, Sheer El-Showk, Stanislav Fort, Zac Hatfield-Dodds, Tom Henighan, Danny Hernandez, Tristan Hume, Josh Jacobson, Scott Johnston, Shauna Kravec, Catherine Olsson, Sam Ringer, Eli Tran-Johnson, Dario Amodei, Tom Brown, Nicholas Joseph, Sam McCandlish, Chris Olah, Jared Kaplan, Jack Clark
Reinforcement Learning NCD Method Critical Lesson Red Teaming Potential Harm Scaling Behavior

May 20, 2022

What's the Harm? Sharp Bounds on the Fraction Negatively Affected by Treatment
Nathan Kallus
Causal Inference High Quality Counterfactuals Relevant Covariates Potential Harm Accurate Treatment Sharp Bound Robust Inference Robust Bayesian

April 28, 2022

Justice in Misinformation Detection Systems: An Analysis of Algorithms, Stakeholders, and Potential Harms
Terrence Neumann, Maria De-Arteaga, Sina Fazelpour
Practical Algorithm Misinformation Detection Potential Harm Different Stakeholder Fairness Audit Vision Science JUSTICE Data Justice

January 10, 2022

The Dataset Nutrition Label (2nd Gen): Leveraging Context to Mitigate Harms in Artificial Intelligence
Kasia S. Chmielinski, Sarah Newman, Matt Taylor, Josh Joseph, Kemi Thomas, Jessica Yurkofsky, Yue Chelsea Qiu
Artificial Intelligence Data Set Context Information Data Science Potential Harm Next Generation New Datasets

December 13, 2021

hARMS: A Hardware Acceleration Architecture for Real-Time Event-Based Optical Flow
Daniel C. Stumpp, Himanshu Akolkar, Alan D. George, Ryad B. Benosman
Optical Flow Hardware Accelerator Visual Sensor Potential Harm Event Based Optical Flow Aperture Snapshot

December 8, 2021

Ethical and social risks of harm from Language Models
Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, Zac Kenton, Sasha Brown, Will Hawkins, Tom Stepleton, Courtney Biles, Abeba Birhane, Julia Haas, Laura Rimell, Lisa Anne Hendricks, William Isaac, Sean Legassick, Geoffrey Irving, Iason Gabriel
Language Model Large Scale Language Model Potential Harm Ethical Behavior AI Harm Societal Scale Risk

Potential Harm

Papers

Deep Learning and Ethics

Centering the Margins: Outlier-Based Identification of Harmed Populations in Toxicity Detection

Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews

Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of Foundation Models

How Technology Impacts and Compares to Humans in Socially Consequential Arenas

Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey

A Causal Analysis of Harm

Quantifying Harm

Regulating eXplainable Artificial Intelligence (XAI) May Harm Consumers

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

What's the Harm? Sharp Bounds on the Fraction Negatively Affected by Treatment

Justice in Misinformation Detection Systems: An Analysis of Algorithms, Stakeholders, and Potential Harms

The Dataset Nutrition Label (2nd Gen): Leveraging Context to Mitigate Harms in Artificial Intelligence

hARMS: A Hardware Acceleration Architecture for Real-Time Event-Based Optical Flow

Ethical and social risks of harm from Language Models