Adversarial Removal

Adversarial removal focuses on mitigating the impact of malicious inputs designed to deceive machine learning models, a critical challenge across various applications. Current research emphasizes developing robust defenses against these attacks, exploring techniques like embedding-based adjustments for language models, counterfactual explanations for image forgery detection, and randomized smoothing for sequence classifiers. These efforts aim to improve the reliability and trustworthiness of AI systems by enhancing their resilience to adversarial manipulation, with implications for AI safety, security, and the broader deployment of machine learning technologies.

Papers

June 24, 2024

BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models
Yi Zeng, Weiyu Sun, Tran Ngoc Huynh, Dawn Song, Bo Li, Ruoxi Jia
Backdoor Attack Backdoor Trigger Instruction Tuned Language Model Instruction Backdoor Attack Backdoor Attacker Adversarial Removal

April 12, 2024

Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts
Yang Li, Songlin Yang, Wei Wang, Ziwen He, Bo Peng, Jing Dong
Counterfactual Explanation Face Forgery Detection Ring Artifact Transferable Adversarial Attack Face Forgery Adversarial Forgery Adversarial Removal

January 31, 2023

RS-Del: Edit Distance Robustness Certificates for Sequence Classifiers via Randomized Deletion
Zhuoqun Huang, Neil G. Marchant, Keane Lucas, Lujo Bauer, Olga Ohrimenko, Benjamin I. P. Rubinstein
Adversarial Example Randomized Smoothing Robustness Certificate Sequence Classifier Deletion Inference Adversarial Removal

December 7, 2021

Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal
Yucheng Shi, Yahong Han, Yu-an Tan, Xiaohui Kuang
Vision Transformer Adversarial Robustness Decision Based Better Adversarial Robustness Adversarial Removal

November 4, 2021

Adversarial Attacks on Knowledge Graph Embeddings via Instance Attribution Methods
Peru Bhardwaj, John Kelleher, Luca Costabello, Declan O'Sullivan
Adversarial Attack Data Poisoning Attack Knowledge Graph Embeddings Adversarial Removal

Adversarial Removal

Papers

BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models

Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts

RS-Del: Edit Distance Robustness Certificates for Sequence Classifiers via Randomized Deletion

Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal

Adversarial Attacks on Knowledge Graph Embeddings via Instance Attribution Methods