Backdoor Policy

Backdoor attacks on machine learning models involve subtly poisoning training data to trigger malicious behavior during inference, compromising model integrity without obvious signs of tampering. Current research focuses on developing robust defense mechanisms, exploring techniques like unlearning poisoned samples, optimizing model parameters to smoother minima (e.g., using Fisher Information), and leveraging adversarial training to mitigate backdoor effects across various model architectures, including CNNs and transformers, and diverse data modalities (image, audio, text, video). Understanding the relationship between task similarity and backdoor detectability is a key area of investigation, with implications for improving both attack and defense strategies, ultimately enhancing the security and trustworthiness of machine learning systems.

Papers