Adversarial Input
Adversarial input research focuses on developing and mitigating vulnerabilities in machine learning models, particularly large language models (LLMs) and deep neural networks (DNNs), by crafting inputs designed to elicit incorrect or harmful outputs. Current research emphasizes developing novel attack methods, such as prompt injection and image manipulation techniques, alongside robust defenses including adversarial training, invariance regularization, and prompt rewriting. This field is crucial for ensuring the safe and reliable deployment of AI systems across various applications, from autonomous vehicles to medical diagnosis, by improving model robustness and trustworthiness.
Papers
January 3, 2025
December 28, 2024
December 17, 2024
November 23, 2024
October 28, 2024
October 24, 2024
October 9, 2024
October 4, 2024
October 2, 2024
September 11, 2024
September 3, 2024
August 9, 2024
July 22, 2024
July 14, 2024
July 12, 2024
July 1, 2024
June 12, 2024
June 5, 2024
May 26, 2024