Textual Backdoor Attack
Textual backdoor attacks exploit vulnerabilities in natural language processing (NLP) models by injecting malicious triggers into training data, causing the model to misclassify inputs containing those triggers. Current research focuses on developing increasingly stealthy attacks using various techniques, including modifying sentence structure, leveraging large language models for trigger generation, and manipulating attention mechanisms within the model architecture. These attacks pose a significant threat to the reliability and security of NLP systems, driving research into robust defense mechanisms and standardized evaluation frameworks to ensure the trustworthiness of deployed models.
Papers
December 26, 2024
December 23, 2024
September 26, 2024
September 25, 2024
August 21, 2024
July 4, 2024
March 25, 2024
February 12, 2024
December 26, 2023
October 28, 2023
October 23, 2023
August 21, 2023
May 3, 2023
May 2, 2023
April 27, 2023
March 3, 2023
November 10, 2022
October 14, 2022
June 17, 2022