Trojan Attack

Trojan attacks involve the malicious insertion of hidden functionalities into machine learning models or hardware circuits, causing unintended behavior triggered by specific inputs. Current research focuses on detecting and mitigating these attacks across various domains, including deep neural networks, large language models, and analog/mixed-signal circuits, employing techniques like large language models (LLMs), adversarial learning, and analysis of attention mechanisms or network sparsity. The significance of this research lies in securing increasingly prevalent AI systems and hardware components, safeguarding against potentially catastrophic consequences in safety-critical applications.

Papers