Potential Harm
Research on potential harms from artificial intelligence (AI) systems, particularly large language models (LLMs), focuses on identifying and mitigating biases, inaccuracies, and vulnerabilities that lead to negative societal impacts. Current efforts utilize various techniques, including human-centered evaluations, post-hoc model correction methods, and the development of new datasets and annotation frameworks to better understand and categorize different types of harm. This research is crucial for ensuring responsible AI development and deployment, addressing issues ranging from algorithmic bias and misinformation to safety concerns in high-stakes applications like healthcare and law enforcement.
Papers
November 23, 2024
November 6, 2024
October 4, 2024
October 1, 2024
September 29, 2024
September 5, 2024
September 3, 2024
August 29, 2024
August 9, 2024
July 16, 2024
July 3, 2024
July 1, 2024
June 26, 2024
June 24, 2024
May 28, 2024
May 19, 2024
May 16, 2024
May 11, 2024
May 8, 2024