Adversarial Testing
Adversarial testing rigorously probes the robustness of machine learning models, particularly large language models (LLMs) and deep learning systems for computer vision, by subjecting them to carefully crafted inputs designed to elicit failures or biases. Current research focuses on developing automated adversarial attack methods, such as generative agents and single-turn crescendo attacks, and improving defenses through techniques like conformal prediction and robust training. This work is crucial for ensuring the safety and reliability of AI systems across diverse applications, from autonomous vehicles to medical diagnosis, by identifying and mitigating vulnerabilities before deployment.
Papers
October 31, 2022
July 19, 2022
July 6, 2022
June 19, 2022
May 26, 2022
March 31, 2022
February 24, 2022