Neural Trojan

Neural Trojans are malicious backdoors inserted into deep neural networks (DNNs), causing the model to produce erroneous outputs when presented with specific "trigger" inputs while behaving normally otherwise. Current research focuses on developing more sophisticated Trojan attacks that evade detection, including those utilizing logic locking or dormant activation mechanisms, and on creating robust defense mechanisms such as input filtering and sparsity-based detection methods. The vulnerability of DNNs to these attacks poses significant risks to the security and reliability of AI systems across various applications, driving intense research efforts to improve both attack detection and prevention.

Papers