Adversarial Input
Adversarial input research focuses on developing and mitigating vulnerabilities in machine learning models, particularly large language models (LLMs) and deep neural networks (DNNs), by crafting inputs designed to elicit incorrect or harmful outputs. Current research emphasizes developing novel attack methods, such as prompt injection and image manipulation techniques, alongside robust defenses including adversarial training, invariance regularization, and prompt rewriting. This field is crucial for ensuring the safe and reliable deployment of AI systems across various applications, from autonomous vehicles to medical diagnosis, by improving model robustness and trustworthiness.
Papers
May 23, 2022
May 21, 2022
April 21, 2022
April 18, 2022
April 14, 2022
April 13, 2022
April 10, 2022
March 29, 2022
March 23, 2022
February 18, 2022
February 12, 2022
January 8, 2022
December 23, 2021
November 25, 2021
November 19, 2021
November 17, 2021