Backdoor Behavior

Backdoor attacks exploit vulnerabilities in machine learning models to trigger malicious behavior, such as misclassification or data manipulation, when specific patterns (triggers) are present in the input. Current research focuses on detecting and mitigating these attacks across various model architectures, including object detectors, reinforcement learning agents, and large language models, exploring techniques like module inconsistency analysis and data-free pruning methods. The ability to reliably detect and remove backdoors is crucial for ensuring the trustworthiness and security of AI systems deployed in sensitive applications, ranging from autonomous vehicles to medical diagnosis.

Papers