Adversarial Text Perturbation
Adversarial text perturbation explores how small changes to text can drastically alter the output of natural language processing (NLP) models, aiming to understand and mitigate this vulnerability. Current research focuses on developing both attacks (methods to create these perturbations) and defenses, often employing transformer models like BERT and RoBERTa, and exploring techniques like latent representation randomization and data augmentation strategies. This research is crucial for improving the robustness and reliability of NLP systems across various applications, from sentiment analysis in news to content moderation on social media, where susceptibility to adversarial attacks can have significant consequences.
Papers
February 18, 2024
February 3, 2024
October 2, 2023
May 16, 2023
June 23, 2022