Backdoor Removal

Backdoor removal focuses on mitigating malicious modifications to machine learning models, where attackers embed "triggers" causing unintended behavior. Current research emphasizes developing techniques to identify and neutralize these triggers, often employing methods like unlearning, relearning, and adversarial training across diverse model architectures including Convolutional Neural Networks (CNNs), Graph Neural Networks (GNNs), and Large Language Models (LLMs). Effective backdoor removal is crucial for ensuring the trustworthiness and security of AI systems deployed in sensitive applications, ranging from medical diagnosis to autonomous vehicles.

Papers