Sentence Level Attack

Sentence-level attacks aim to subtly manipulate text inputs to fool natural language processing (NLP) models, including large language models (LLMs) and question-answering systems, into making incorrect predictions while maintaining grammatical fluency and semantic coherence. Current research focuses on developing more effective attack strategies, leveraging techniques like synonym replacement, alpha transparency exploitation, and manipulating class probabilities to enhance attack transferability across different models. Understanding and mitigating these vulnerabilities is crucial for ensuring the reliability and security of NLP systems across various applications, from toxicity detection to information retrieval.

Papers