Potential Harm
Research on potential harms from artificial intelligence (AI) systems, particularly large language models (LLMs), focuses on identifying and mitigating biases, inaccuracies, and vulnerabilities that lead to negative societal impacts. Current efforts utilize various techniques, including human-centered evaluations, post-hoc model correction methods, and the development of new datasets and annotation frameworks to better understand and categorize different types of harm. This research is crucial for ensuring responsible AI development and deployment, addressing issues ranging from algorithmic bias and misinformation to safety concerns in high-stakes applications like healthcare and law enforcement.
Papers
May 19, 2023
November 27, 2022
November 2, 2022
October 14, 2022
October 11, 2022
September 29, 2022
September 7, 2022
August 23, 2022
May 20, 2022
April 28, 2022
January 10, 2022
December 13, 2021
December 8, 2021