Backdoor Attack
Backdoor attacks exploit vulnerabilities in machine learning models by embedding hidden triggers during training, causing the model to produce malicious outputs when the trigger is present. Current research focuses on developing and mitigating these attacks across various model architectures, including deep neural networks, vision transformers, graph neural networks, large language models, and spiking neural networks, with a particular emphasis on understanding attack mechanisms and developing robust defenses in federated learning and generative models. The significance of this research lies in ensuring the trustworthiness and security of increasingly prevalent machine learning systems across diverse applications, ranging from object detection and medical imaging to natural language processing and autonomous systems.
Papers
Mitigating Label Flipping Attacks in Malicious URL Detectors Using Ensemble Trees
Ehsan Nowroozi, Nada Jadalla, Samaneh Ghelichkhani, Alireza Jolfaei
A general approach to enhance the survivability of backdoor attacks by decision path coupling
Yufei Zhao, Dingji Wang, Bihuan Chen, Ziqian Chen, Xin Peng
Here's a Free Lunch: Sanitizing Backdoored Models with Model Merge
Ansh Arora, Xuanli He, Maximilian Mozes, Srinibas Swain, Mark Dras, Qiongkai Xu
SynGhost: Imperceptible and Universal Task-agnostic Backdoor Attack in Pre-trained Language Models
Pengzhou Cheng, Wei Du, Zongru Wu, Fengwei Zhang, Libo Chen, Gongshen Liu
Mudjacking: Patching Backdoor Vulnerabilities in Foundation Models
Hongbin Liu, Michael K. Reiter, Neil Zhenqiang Gong
Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment
Jiongxiao Wang, Jiazhao Li, Yiquan Li, Xiangyu Qi, Junjie Hu, Yixuan Li, Patrick McDaniel, Muhao Chen, Bo Li, Chaowei Xiao