Model Based Shielding

Model-based shielding is a technique used to enhance the safety and reliability of various machine learning models, particularly in high-stakes applications like autonomous driving and reinforcement learning. Current research focuses on developing model-agnostic shielding methods, improving the efficiency of shielding algorithms (e.g., through dynamic or approximate approaches), and applying shielding to address specific vulnerabilities like backdoor attacks in graph neural networks and prompt injection in large language models. These advancements are crucial for deploying machine learning systems in safety-critical domains, ensuring both performance and robustness against unforeseen circumstances or malicious attacks.

Papers