Paper ID: 2402.02034
Universal Post-Training Reverse-Engineering Defense Against Backdoors in Deep Neural Networks
Xi Li, Hang Wang, David J. Miller, George Kesidis
A variety of defenses have been proposed against backdoors attacks on deep neural network (DNN) classifiers. Universal methods seek to reliably detect and/or mitigate backdoors irrespective of the incorporation mechanism used by the attacker, while reverse-engineering methods often explicitly assume one. In this paper, we describe a new detector that: relies on internal feature map of the defended DNN to detect and reverse-engineer the backdoor and identify its target class; can operate post-training (without access to the training dataset); is highly effective for various incorporation mechanisms (i.e., is universal); and which has low computational overhead and so is scalable. Our detection approach is evaluated for different attacks on benchmark CIFAR-10 and CIFAR-100 image classifiers.
Submitted: Feb 3, 2024