Word Level Adversarial
Word-level adversarial attacks aim to subtly alter text inputs to fool natural language processing (NLP) models into making incorrect predictions, highlighting vulnerabilities in these systems. Current research focuses on developing more effective attack methods, such as those leveraging large language models to generate natural-sounding adversarial examples, and designing robust defenses, including techniques that randomize latent representations or utilize local explainability methods. This research is crucial for improving the reliability and security of NLP applications, as it directly addresses the susceptibility of these models to malicious manipulation.
Papers
November 12, 2024
November 20, 2023
October 2, 2023
August 1, 2023
July 31, 2023
May 24, 2023
November 12, 2022