Universal Backdoor
Universal backdoor attacks exploit vulnerabilities in machine learning models, allowing malicious actors to control model outputs by introducing a hidden trigger during training. Current research focuses on developing both more effective universal backdoor attacks across diverse model architectures (including deep neural networks, reinforcement learning agents, and large language models) and robust defenses that can detect and mitigate these attacks, often leveraging activation space analysis or Gram matrices. The significance of this research lies in its implications for the security and trustworthiness of AI systems deployed in critical applications, necessitating the development of more resilient and verifiable models.
Papers
May 30, 2024
May 18, 2024
April 22, 2024
November 30, 2023
November 24, 2023
September 13, 2023
May 16, 2023
March 28, 2023
September 23, 2022