Adversarial Input
Adversarial input research focuses on developing and mitigating vulnerabilities in machine learning models, particularly large language models (LLMs) and deep neural networks (DNNs), by crafting inputs designed to elicit incorrect or harmful outputs. Current research emphasizes developing novel attack methods, such as prompt injection and image manipulation techniques, alongside robust defenses including adversarial training, invariance regularization, and prompt rewriting. This field is crucial for ensuring the safe and reliable deployment of AI systems across various applications, from autonomous vehicles to medical diagnosis, by improving model robustness and trustworthiness.
Papers
April 26, 2024
March 21, 2024
February 22, 2024
February 7, 2024
December 15, 2023
December 11, 2023
December 6, 2023
December 3, 2023
November 28, 2023
November 16, 2023
October 5, 2023
September 20, 2023
September 19, 2023
August 16, 2023
August 3, 2023
July 3, 2023
June 26, 2023
June 25, 2023