Safety Risk

Safety risk in artificial intelligence, particularly concerning large language models (LLMs) and autonomous vehicles, is a critical research area focused on identifying and mitigating vulnerabilities that lead to unsafe outputs or behaviors. Current research emphasizes developing robust evaluation methods and datasets, such as multi-task safety moderation datasets, to benchmark model performance and identify weaknesses across various risk categories, including malicious intent detection and harmful content generation. These efforts aim to improve the safety and reliability of AI systems through the development of improved moderation tools and safety-enhancing algorithms, ultimately impacting the responsible deployment of AI in real-world applications.

Papers

January 15, 2025

When Uncertainty Leads to Unsafety: Empirical Insights into the Role of Uncertainty in Unmanned Aerial Vehicle Safety
Sajad Khatiri, Fatemeh Mohammadi Amin, Sebastiano Panichella, Paolo Tonella
Empirical Study High Uncertainty Anticipation Integral Role Collision Avoidance Autonomous Unmanned Aerial Vehicle Safety Risk

December 27, 2024

Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging
Hua Farn, Hsuan Su, Shachi H Kumar, Saurav Sahay, Shang-Tse Chen, Hung-yi Lee
Large Language Model Model Safety Pre Print Downstream Task Performance Safety Risk

December 2, 2024

Usage Governance Advisor: from Intent to AI Governance
Elizabeth M. Daly, Sean Rooney, Seshu Tirupathi, Luis Garces-Erice, Inge Vejsbjerg, Frank Bagehorn, Dhaval Salwala, Christopher Giblin, Mira L. Wolf-Bauwens, Ioana Giurgiu, Michael Hind, Peter Urbanetz
AI System Artificial Intelligence System Human Intent Artificial Intelligence Governance Risk Assessment Data Governance Safety Risk

November 1, 2024

Emoji Attack: A Method for Misleading Judge LLMs in Safety Risk Detection
Zhipeng Wei, Yuqi Liu, N. Benjamin Erichson
Large Language Model Data Detection Practical Method Jailbreak Attack Token Embeddings Token Bias Safety Risk

October 7, 2024

SoK: Towards Security and Safety of Edge AI
Tatjana Wingarz, Anne Lauscher, Janick Edinger, Dominik Kaaser, Stefan Schulte, Mathias Fischer
Large Language Model Human SAFETY Security Related Edge AI AI Application Decentralized Manner Safety Risk

August 16, 2024

TsCA: On the Semantic Consistency Alignment via Conditional Transport for Compositional Zero-Shot Learning
Miaoge Li, Jingcai Guo, Richard Yi Da Xu, Dongsheng Wang, Xiaofeng Cao, Song Guo
Semantic Consistency Compositional Zero Shot Learning Compositional Semantics Conditional Transport Safety Risk

June 26, 2024

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
Seungju Han, Kavel Rao, Allyson Ettinger, Liwei Jiang, Bill Yuchen Lin, Nathan Lambert, Yejin Choi, Nouha Dziri
Medical LLM Jailbreak Attack Safety Risk Moderation Tool

February 1, 2024

Safety of Multimodal Large Language Models on Images and Texts
Xin Liu, Yichen Zhu, Yunshi Lan, Chao Yang, Yu Qiao
Multimodal Large Language Model Human SAFETY Text Based Security Vulnerability Safety Risk

November 14, 2023

SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models
Bertie Vidgen, Nino Scherrer, Hannah Rose Kirk, Rebecca Qian, Anand Kannappan, Scott A. Hale, Paul Röttger
Test Suite Safety Risk

May 3, 2023

VSRQ: Quantitative Assessment Method for Safety Risk of Vehicle Intelligent Connected System
Tian Zhang, Wenshan Guan, Hao Miao, Xiujie Huang, Zhiquan Liu, Chaonan Wang, Quanlong Guan, Liangda Fang, Zhifei Duan
Intelligent Vehicle Quantitative Evaluation Fuzzy Clustering Safety Risk

February 18, 2023

Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements
Jiawen Deng, Jiale Cheng, Hao Sun, Zhexin Zhang, Minlie Huang
Language Model Timely Survey Global Evaluation Large Relevance Improvement Responsible AI Functional Safety Safety Risk

June 2, 2022

Watch Out for the Safety-Threatening Actors: Proactively Mitigating Safety Hazards
Saurabh Jha, Shengkun Cui, Zbigniew Kalbarczyk, Ravishankar K. Iyer
Autonomous Vehicle Counterfactual Reasoning Safety Critical Control Malicious Agent Self Driving Car Enhancing Safety Safety Risk

Safety Risk

Papers

When Uncertainty Leads to Unsafety: Empirical Insights into the Role of Uncertainty in Unmanned Aerial Vehicle Safety

Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging

Usage Governance Advisor: from Intent to AI Governance

Emoji Attack: A Method for Misleading Judge LLMs in Safety Risk Detection

SoK: Towards Security and Safety of Edge AI

TsCA: On the Semantic Consistency Alignment via Conditional Transport for Compositional Zero-Shot Learning

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

Safety of Multimodal Large Language Models on Images and Texts

SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models

VSRQ: Quantitative Assessment Method for Safety Risk of Vehicle Intelligent Connected System

Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements

Watch Out for the Safety-Threatening Actors: Proactively Mitigating Safety Hazards