Adversarial Testing

Adversarial testing rigorously probes the robustness of machine learning models, particularly large language models (LLMs) and deep learning systems for computer vision, by subjecting them to carefully crafted inputs designed to elicit failures or biases. Current research focuses on developing automated adversarial attack methods, such as generative agents and single-turn crescendo attacks, and improving defenses through techniques like conformal prediction and robust training. This work is crucial for ensuring the safety and reliability of AI systems across diverse applications, from autonomous vehicles to medical diagnosis, by identifying and mitigating vulnerabilities before deployment.

Papers