Adversarial Text Perturbation

Adversarial text perturbation explores how small changes to text can drastically alter the output of natural language processing (NLP) models, aiming to understand and mitigate this vulnerability. Current research focuses on developing both attacks (methods to create these perturbations) and defenses, often employing transformer models like BERT and RoBERTa, and exploring techniques like latent representation randomization and data augmentation strategies. This research is crucial for improving the robustness and reliability of NLP systems across various applications, from sentiment analysis in news to content moderation on social media, where susceptibility to adversarial attacks can have significant consequences.

Papers