Text Attack

Text attacks involve crafting malicious inputs—often subtly altered text or images—to deceive large language models (LLMs) and other AI systems, compromising their accuracy and safety. Current research focuses on developing increasingly sophisticated attack methods, including gradient-based approaches for image manipulation and "priming attacks" that exploit vulnerabilities in model training, while also exploring defenses such as watermarking and improved model robustness through techniques like adversarial training and saliency-based detection. Understanding and mitigating these attacks is crucial for ensuring the reliability and trustworthiness of AI systems across various applications, from cybersecurity to information retrieval.

Papers