Text Attack
Text attacks involve crafting malicious inputs—often subtly altered text or images—to deceive large language models (LLMs) and other AI systems, compromising their accuracy and safety. Current research focuses on developing increasingly sophisticated attack methods, including gradient-based approaches for image manipulation and "priming attacks" that exploit vulnerabilities in model training, while also exploring defenses such as watermarking and improved model robustness through techniques like adversarial training and saliency-based detection. Understanding and mitigating these attacks is crucial for ensuring the reliability and trustworthiness of AI systems across various applications, from cybersecurity to information retrieval.
Papers
October 4, 2024
July 25, 2024
July 16, 2024
July 5, 2024
June 18, 2024
February 1, 2024
December 19, 2023
October 16, 2023
July 13, 2023
June 12, 2023
June 5, 2023
May 31, 2023
May 24, 2023
February 12, 2023
February 3, 2023
October 21, 2022
October 1, 2022
July 6, 2022