Potential Harm

Research on potential harms from artificial intelligence (AI) systems, particularly large language models (LLMs), focuses on identifying and mitigating biases, inaccuracies, and vulnerabilities that lead to negative societal impacts. Current efforts utilize various techniques, including human-centered evaluations, post-hoc model correction methods, and the development of new datasets and annotation frameworks to better understand and categorize different types of harm. This research is crucial for ensuring responsible AI development and deployment, addressing issues ranging from algorithmic bias and misinformation to safety concerns in high-stakes applications like healthcare and law enforcement.

Papers