Sentence Level Attack
Sentence-level attacks aim to subtly manipulate text inputs to fool natural language processing (NLP) models, including large language models (LLMs) and question-answering systems, into making incorrect predictions while maintaining grammatical fluency and semantic coherence. Current research focuses on developing more effective attack strategies, leveraging techniques like synonym replacement, alpha transparency exploitation, and manipulating class probabilities to enhance attack transferability across different models. Understanding and mitigating these vulnerabilities is crucial for ensuring the reliability and security of NLP systems across various applications, from toxicity detection to information retrieval.
Papers
November 4, 2024
May 27, 2024
March 19, 2024
February 15, 2024
February 5, 2024
November 29, 2023
October 23, 2023
October 19, 2023
October 27, 2022