Backdoor Purification

Backdoor purification aims to remove malicious backdoors implanted in deep neural networks (DNNs) during training, restoring the model's integrity without significantly impacting its performance on legitimate data. Current research focuses on techniques like fine-tuning model weights or activations, often guided by metrics such as Fisher Information to optimize the purification process and minimize clean data accuracy loss. These methods are being evaluated across various DNN architectures and attack types, with a growing emphasis on efficient and robust solutions that require minimal labeled data or even operate effectively with unlabeled data, thereby enhancing the security and trustworthiness of deployed DNN models.

Papers