Adversarial Input
Adversarial input research focuses on developing and mitigating vulnerabilities in machine learning models, particularly large language models (LLMs) and deep neural networks (DNNs), by crafting inputs designed to elicit incorrect or harmful outputs. Current research emphasizes developing novel attack methods, such as prompt injection and image manipulation techniques, alongside robust defenses including adversarial training, invariance regularization, and prompt rewriting. This field is crucial for ensuring the safe and reliable deployment of AI systems across various applications, from autonomous vehicles to medical diagnosis, by improving model robustness and trustworthiness.
Papers
January 27, 2023
January 5, 2023
December 24, 2022
November 30, 2022
November 29, 2022
November 23, 2022
November 4, 2022
October 31, 2022
October 10, 2022
July 31, 2022
July 21, 2022
July 8, 2022
July 3, 2022
June 23, 2022
June 19, 2022
June 16, 2022
June 15, 2022