Transferable Backdoor Attack

Transferable backdoor attacks exploit vulnerabilities in machine learning models, allowing malicious actors to manipulate model outputs using hidden triggers that transfer across different models or tasks. Current research focuses on understanding the mechanisms of these attacks across various architectures, including large language models (LLMs), pre-trained models (PTMs), and graph neural networks (GNNs), and developing methods to inject and detect these backdoors, often leveraging techniques like low-rank adaptation (LoRA) or embedding space analysis. The ability of these attacks to compromise seemingly secure models highlights a critical need for robust defenses and underscores the importance of developing more resilient and trustworthy AI systems.

Papers