Backdoor Attacker
Backdoor attacks surreptitiously manipulate machine learning models, causing them to behave maliciously under specific, hidden triggers while appearing normal otherwise. Current research focuses on developing both novel attack techniques, including those that are difficult or impossible to detect even with full model access, and robust defenses. These defenses range from embedding-based methods that identify and counteract malicious behavior to ensemble approaches that leverage the shortcut nature of backdoors. Understanding and mitigating these attacks is crucial for ensuring the safety and reliability of increasingly prevalent AI systems.
Papers
June 24, 2024
June 3, 2024
April 10, 2024
May 24, 2023
February 1, 2023