Adversarial Change

Adversarial change research investigates how machine learning models react to intentionally malicious modifications of their inputs or training data, aiming to improve model robustness and trustworthiness. Current research focuses on developing methods to evaluate and enhance model resilience against these attacks across various domains, including image classification, natural language processing, and reinforcement learning, employing techniques like adversarial training and bi-level optimization. This work is crucial for ensuring the reliability and safety of AI systems deployed in high-stakes applications, where vulnerabilities to adversarial manipulation can have significant consequences.

Papers