Word Level Attack
Word-level attacks aim to subtly alter text inputs to mislead natural language processing (NLP) models, revealing vulnerabilities in their robustness and raising concerns about their reliability in real-world applications. Current research focuses on developing both more effective attacks, often employing techniques like Markov decision processes and beam search to optimize perturbations, and stronger defenses, such as stochastic purification and adversarial training methods that learn more robust representations. Understanding and mitigating these vulnerabilities is crucial for ensuring the safety and trustworthiness of NLP models across various domains, from sentiment analysis to hate speech detection.
Papers
November 1, 2024
August 11, 2024
June 18, 2024
June 12, 2024
May 14, 2024
April 9, 2024
March 27, 2024
November 29, 2023
October 4, 2023
September 25, 2023
September 4, 2023
April 16, 2023
March 28, 2023
March 9, 2023
February 6, 2023
May 24, 2022
April 10, 2022