Training Time Attack

Training-time attacks exploit vulnerabilities in the machine learning model training process to inject malicious behavior, compromising model integrity and security. Current research focuses on various attack vectors, including backdoor insertion, data poisoning, and adversarial reward manipulation, across diverse model architectures like LLMs, reinforcement learning agents, and deep neural networks. These attacks pose significant risks to the reliability and trustworthiness of AI systems across numerous applications, driving intense investigation into robust defense mechanisms and verifiable training methods. The ultimate goal is to develop models resistant to manipulation during training, ensuring the safety and security of deployed AI systems.

Papers

November 2, 2022

Dormant Neural Trojans
Feisi Fu, Panagiota Kiourti, Wenchao Li
Backdoor Attack Backdoor Detection Training Time Attack Trojan Detection Neural Trojan

October 6, 2022

Towards Out-of-Distribution Adversarial Robustness
Adam Ibrahim, Charles Guille-Escuret, Ioannis Mitliagkas, Irina Rish, David Krueger, Pouya Bashivan
Native Robustness Adversarial Robustness Extrapolation Framework Training Time Attack Domain Generalisation

August 15, 2022

Training-Time Attacks against k-Nearest Neighbors
Ara Vartanian, Will Rosenbaum, Scott Alfeld
Nearest Neighbor Training Time Attack K Nearest Neighbor Neighborhood Selection Training Set Attack

June 8, 2022

On the Permanence of Backdoors in Evolving Models
Huiying Li, Arjun Nitin Bhagoji, Yuxin Chen, Haitao Zheng, Ben Y. Zhao
Backdoor Attack Time Varying Backdoor Defense Training Time Attack Object Permanence Model Evolution

December 9, 2021

Spinning Language Models: Risks of Propaganda-As-A-Service and Countermeasures
Eugene Bagdasaryan, Vitaly Shmatikov
Language Model Sequence to Sequence Risk Sensitive Seq2seq Model Adversary Agent Rapid Countermeasure Training Time Attack

November 8, 2021

Get a Model! Model Hijacking Attack Against Machine Learning Models
Ahmed Salem, Michael Backes, Yang Zhang
Machine Learning Model Full Model Training Time Attack Model Hijacking Attack