Backdoor Behavior
Backdoor attacks exploit vulnerabilities in machine learning models to trigger malicious behavior, such as misclassification or data manipulation, when specific patterns (triggers) are present in the input. Current research focuses on detecting and mitigating these attacks across various model architectures, including object detectors, reinforcement learning agents, and large language models, exploring techniques like module inconsistency analysis and data-free pruning methods. The ability to reliably detect and remove backdoors is crucial for ensuring the trustworthiness and security of AI systems deployed in sensitive applications, ranging from autonomous vehicles to medical diagnosis.
Papers
November 7, 2024
September 24, 2024
May 24, 2024
March 25, 2024
February 10, 2024
January 10, 2024
November 25, 2023
July 14, 2023
April 22, 2023
April 5, 2023
March 23, 2023
November 22, 2022
August 5, 2022
April 29, 2022
January 21, 2022