Paper ID: 2311.15366

Untargeted Code Authorship Evasion with Seq2Seq Transformation

Soohyeon Choi, Rhongho Jang, DaeHun Nyang, David Mohaisen

Code authorship attribution is the problem of identifying authors of programming language codes through the stylistic features in their codes, a topic that recently witnessed significant interest with outstanding performance. In this work, we present SCAE, a code authorship obfuscation technique that leverages a Seq2Seq code transformer called StructCoder. SCAE customizes StructCoder, a system designed initially for function-level code translation from one language to another (e.g., Java to C#), using transfer learning. SCAE improved the efficiency at a slight accuracy degradation compared to existing work. We also reduced the processing time by about 68% while maintaining an 85% transformation success rate and up to 95.77% evasion success rate in the untargeted setting.

Submitted: Nov 26, 2023

Topics

Seq2seq Model
Code Data
Code Translation
Authorship Obfuscation
Code Authorship Attribution

Links

arXiv PDF