Neural Backdoor

Neural backdoors are malicious modifications to deep neural networks (DNNs) that cause them to misclassify inputs containing a specific trigger, compromising model integrity and security. Current research focuses on developing robust defense mechanisms, primarily employing neuron pruning techniques guided by weight changes, neuron magnitude, or optimization algorithms like reinforcement learning, often combined with unlearning and relearning strategies. These efforts aim to identify and remove backdoored neurons or mitigate their effects without significantly impacting the model's overall accuracy, addressing a critical vulnerability in the deployment of DNNs across various applications. The significance lies in ensuring the trustworthiness and reliability of AI systems, particularly in security-sensitive contexts.

Papers