Clean Label Attack

Clean-label backdoor attacks are a stealthy form of data poisoning that manipulates machine learning models without altering training labels, making them difficult to detect. Current research focuses on developing more effective attack strategies, particularly those requiring minimal knowledge of the training data, and exploring defenses against these attacks across various model architectures, including neural networks, gradient boosting models, and language models like BERT. The ability to subtly inject malicious behavior into models highlights significant vulnerabilities in machine learning systems, impacting the trustworthiness of AI in security-critical applications and necessitating the development of robust detection and mitigation techniques.

Papers