Human SAFETY

Human safety in the context of rapidly advancing AI systems, particularly large language models (LLMs) and autonomous vehicles, is a critical research area focusing on mitigating risks associated with harmful outputs, unreliable predictions, and unforeseen interactions. Current research emphasizes developing robust safety mechanisms, including novel algorithms like Precision Knowledge Editing for LLMs and Physics-Enhanced Residual Policy Learning for autonomous vehicle control, as well as exploring multi-objective learning frameworks to balance safety and performance. These efforts are crucial for ensuring the responsible deployment of AI technologies across various sectors, ultimately improving the reliability and trustworthiness of these systems in real-world applications.

Papers

September 6, 2024

Stacked Universal Successor Feature Approximators for Safety in Reinforcement Learning
Ian Cannon, Washington Garcia, Thomas Gresavage, Joseph Saurine, Ian Leong, Jared Culbertson
Reinforcement Learning Human SAFETY Reinforcement Learning Agent Continuous Control Safety Critical Control Reinforcement Learning Environment Successor Representation

September 5, 2024

September 2, 2024

Time-Varying Soft-Maximum Barrier Functions for Safety in Unmapped and Dynamic Environments
Amirsaeid Safari, Jesse B. Hoagg
Optimal Control Dynamic Environment Control Barrier Function Human SAFETY Unknown Environment

August 8, 2024

Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models
Fabio Pernisi, Dirk Hovy, Paul Röttger
Large Language Model Jailbreak Attack Human SAFETY Question Answer Pair Safety Architecture

August 1, 2024

ABC Align: Large Language Model Alignment for Safety & Accuracy
Gareth Seneque, Lap-Hang Ho, Ariel Kuperman, Nafise Erfanian Saeedi, Jeffrey Molendijk
Human SAFETY LD Align Large Language Model Alignment

July 29, 2024

Anomalous State Sequence Modeling to Enhance Safety in Reinforcement Learning
Leen Kweider, Maissa Abou Kassem, Ubai Sandouk
Reinforcement Learning Safe Reinforcement Learning Human SAFETY State Sequence

July 26, 2024

Blockchain for Large Language Model Security and Safety: A Holistic Survey
Caleb Geren, Amanda Board, Gaby G. Dagher, Tim Andersen, Jun Zhuang
Large Language Model Comprehensive Survey Human SAFETY Blockchain Based Platform Data Provenance

July 24, 2024

SAFETY-J: Evaluating Safety with Critique
Yixiu Liu, Yuxiang Zheng, Shijie Xia, Jiajun Li, Yi Tu, Chaoling Song, Pengfei Liu
Human SAFETY Meta Evaluation Air Guardian System Lower Critique Accuracy Safety Evaluation Generative AI Safety

July 12, 2024

Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Jiahao Xu, Tian Liang, Pinjia He, Zhaopeng Tu
Large Language Model Human SAFETY Model Safety Safety Fine Tuning Refusal Training

July 10, 2024

LSM: A Comprehensive Metric for Assessing the Safety of Lane Detection Systems in Autonomous Driving
Jörg Gamerdinger, Sven Teufel, Stephan Amann, Georg Volk, Oliver Bringmann
Autonomous Driving Human SAFETY Lane Detection Safety Metric

July 9, 2024

Towards a Robotic Intrusion Prevention System: Combining Security and Safety in Cognitive Social Robots
Francisco Martín, Enrique Soriano-Salvador, José Miguel Guerrero, Gorka Guardiola Múzquiz, Juan Carlos Manzanares, Francisco J. Rodríguez
Human SAFETY Cyber Physical System Social Robot Security Related Cognitive Robotics Robotic Detection

July 8, 2024

T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models
Yibo Miao, Yifan Zhu, Yinpeng Dong, Lijia Yu, Jun Zhu, Xiao-Shan Gao
Human SAFETY Text to Video Prompt Attack Generated Video

July 5, 2024

SH17: A Dataset for Human Safety and Personal Protective Equipment Detection in Manufacturing Industry
Hafiz Mughees Ahmad, Afshin Rahimi
Data Set Object Detection Model Human SAFETY Manufacturing Industry Protective Equipment

July 2, 2024

Breach By A Thousand Leaks: Unsafe Information Leakage in `Safe' AI Responses
David Glukhov, Ziwen Han, Ilia Shumailov, Vardan Papyan, Nicolas Papernot
Human SAFETY Data Leakage False Sense Multi Step Adversarial Attack

July 1, 2024

Badllama 3: removing safety finetuning from Llama 3 in minutes
Dmitrii Volkov
Human SAFETY LLM Fine Tuning Model Weight Meeting Minute Whispering Llama Safety Fine Tuning Power Optimization

June 28, 2024

Safety through feedback in Constrained RL
Shashank Reddy Chirra, Pradeep Varakantham, Praveen Paruchuri
Human Feedback Human SAFETY Value Function Feedback System Trajectory Preference

June 16, 2024

Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask
Jingyu Xiao, Zhiyao Xu, Qingsong Zou, Qing Li, Dan Zhao, Dong Fang, Ruoyu Li, Wenxin Tang, Kang Li, Xudong Zuo, Penghui Hu, Yong Jiang, Zixuan Weng, Michael R. Lyv
Human SAFETY Based Anomaly Detection Behavior Model Smart Home Anomalous Behavior Unsupervised Time Series Anomaly Detection Mask Specific Loss

June 10, 2024

Unveiling the Safety of GPT-4o: An Empirical Study using Jailbreak Attacks
Zonghao Ying, Aishan Liu, Xianglong Liu, Dacheng Tao
Empirical Study Jailbreak Attack Human SAFETY

June 6, 2024

Optimizing Autonomous Driving for Safety: A Human-Centric Approach with LLM-Enhanced RLHF
Yuan Sun, Navid Salami Pargoo, Peter J. Jin, Jorge Ortiz
Reinforcement Learning Autonomous Driving Human SAFETY Reinforcement Learning From Human Feedback Human Centric Autonomous Driving Model Autonomous Driving Method

Human SAFETY

Papers

Stacked Universal Successor Feature Approximators for Safety in Reinforcement Learning

Achieving the Safety and Security of the End-to-End AV Pipeline

Safety vs. Performance: How Multi-Objective Learning Reduces Barriers to Market Entry

Time-Varying Soft-Maximum Barrier Functions for Safety in Unmapped and Dynamic Environments

Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models

ABC Align: Large Language Model Alignment for Safety & Accuracy

Anomalous State Sequence Modeling to Enhance Safety in Reinforcement Learning

Blockchain for Large Language Model Security and Safety: A Holistic Survey

SAFETY-J: Evaluating Safety with Critique

Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training

LSM: A Comprehensive Metric for Assessing the Safety of Lane Detection Systems in Autonomous Driving

Towards a Robotic Intrusion Prevention System: Combining Security and Safety in Cognitive Social Robots

T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models

SH17: A Dataset for Human Safety and Personal Protective Equipment Detection in Manufacturing Industry

Breach By A Thousand Leaks: Unsafe Information Leakage in `Safe' AI Responses

Badllama 3: removing safety finetuning from Llama 3 in minutes

Safety through feedback in Constrained RL

Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask

Unveiling the Safety of GPT-4o: An Empirical Study using Jailbreak Attacks

Optimizing Autonomous Driving for Safety: A Human-Centric Approach with LLM-Enhanced RLHF