Adversarial Natural Language
Adversarial natural language processing (NLP) focuses on creating and detecting subtly altered text inputs designed to fool NLP models, revealing vulnerabilities and biases. Current research emphasizes developing adversarial datasets in various languages and for different NLP tasks (e.g., natural language inference, code understanding), often leveraging large language models to generate challenging examples and exploring model-agnostic detection methods based on analyzing model outputs. This work is crucial for improving the robustness and reliability of NLP systems, ultimately leading to more secure and trustworthy applications.
Papers
October 23, 2024
October 11, 2024
June 25, 2024
June 7, 2023
May 22, 2023
October 19, 2022
May 31, 2022
April 22, 2022
April 10, 2022