Paper ID: 2302.05794

Mutation-Based Adversarial Attacks on Neural Text Detectors

Gongbo Liang, Jesus Guerrero, Izzat Alsmadi

Neural text detectors aim to decide the characteristics that distinguish neural (machine-generated) from human texts. To challenge such detectors, adversarial attacks can alter the statistical characteristics of the generated text, making the detection task more and more difficult. Inspired by the advances of mutation analysis in software development and testing, in this paper, we propose character- and word-based mutation operators for generating adversarial samples to attack state-of-the-art natural text detectors. This falls under white-box adversarial attacks. In such attacks, attackers have access to the original text and create mutation instances based on this original text. The ultimate goal is to confuse machine learning models and classifiers and decrease their prediction accuracy.

Submitted: Feb 11, 2023

Topics

Adversarial Attack
Adversarial Sample
Text Detection
Text Detector
White Box Adversarial Attack

Links

arXiv PDF