Word Level Attack

Word-level attacks aim to subtly alter text inputs to mislead natural language processing (NLP) models, revealing vulnerabilities in their robustness and raising concerns about their reliability in real-world applications. Current research focuses on developing both more effective attacks, often employing techniques like Markov decision processes and beam search to optimize perturbations, and stronger defenses, such as stochastic purification and adversarial training methods that learn more robust representations. Understanding and mitigating these vulnerabilities is crucial for ensuring the safety and trustworthiness of NLP models across various domains, from sentiment analysis to hate speech detection.

Papers