Universal Backdoor

Universal backdoor attacks exploit vulnerabilities in machine learning models, allowing malicious actors to control model outputs by introducing a hidden trigger during training. Current research focuses on developing both more effective universal backdoor attacks across diverse model architectures (including deep neural networks, reinforcement learning agents, and large language models) and robust defenses that can detect and mitigate these attacks, often leveraging activation space analysis or Gram matrices. The significance of this research lies in its implications for the security and trustworthiness of AI systems deployed in critical applications, necessitating the development of more resilient and verifiable models.

Papers