Synonym Substitution Attack

Synonym substitution attacks exploit the vulnerability of natural language processing (NLP) models to subtle word replacements, aiming to manipulate model outputs while preserving surface meaning. Current research focuses on improving the robustness of models against these attacks, exploring techniques like randomized smoothing, causal intervention, and synonym-aware pretraining, often within the context of black-box attack scenarios where model internals are hidden. Understanding and mitigating these attacks is crucial for ensuring the reliability and security of NLP systems across various applications, from machine translation to text classification.

Papers