Malicious Data

Malicious data encompasses the intentional introduction of corrupted or deceptive information into datasets used to train or operate machine learning models, posing significant threats to model integrity and security. Current research focuses on understanding and mitigating various forms of malicious data, including backdoor attacks (where triggers cause unintended outputs), data poisoning (introducing corrupted training examples), and the misuse of combinations of seemingly safe models to achieve malicious goals. This research is crucial for developing robust and trustworthy AI systems, impacting fields ranging from cybersecurity and content moderation to healthcare and beyond, by improving the safety and reliability of machine learning applications.

Papers