Mitigating Backdoor

Backdoor attacks, which surreptitiously embed malicious triggers into machine learning models during training, pose a significant threat to the reliability and security of AI systems. Current research focuses on developing robust defense mechanisms against these attacks, targeting various model architectures including large language models, federated learning systems, diffusion models, and vision transformers. These defenses employ diverse strategies such as weight manipulation, model editing, and trigger inversion, aiming to identify and neutralize backdoors without compromising the model's performance on legitimate data. The successful mitigation of backdoor attacks is crucial for ensuring the trustworthiness and widespread adoption of AI in sensitive applications.

Papers