Adversarial Input

Adversarial input research focuses on developing and mitigating vulnerabilities in machine learning models, particularly large language models (LLMs) and deep neural networks (DNNs), by crafting inputs designed to elicit incorrect or harmful outputs. Current research emphasizes developing novel attack methods, such as prompt injection and image manipulation techniques, alongside robust defenses including adversarial training, invariance regularization, and prompt rewriting. This field is crucial for ensuring the safe and reliable deployment of AI systems across various applications, from autonomous vehicles to medical diagnosis, by improving model robustness and trustworthiness.

Papers