Backdoor Detection

Backdoor detection in machine learning focuses on identifying malicious modifications to models that trigger unintended behavior when specific input patterns (triggers) are present. Current research emphasizes developing robust detection methods for various model architectures, including diffusion models, language models, and graph neural networks, often employing techniques like tensor decomposition, uncertainty analysis, and distribution inference to identify anomalies indicative of backdoors. The significance of this research lies in safeguarding the integrity and trustworthiness of machine learning systems across diverse applications, mitigating risks associated with compromised models in sensitive domains.

Papers