Paper ID: 2410.23894

Metamorphic Malware Evolution: The Potential and Peril of Large Language Models

Pooria Madani

Code metamorphism refers to a computer programming exercise wherein the program modifies its own code (partial or entire) consistently and automatically while retaining its core functionality. This technique is often used for online performance optimization and automated crash recovery in certain mission-critical applications. However, the technique has been misappropriated by malware creators to bypass signature-based detection measures instituted by anti-malware engines. However, current code mutation engines used by threat actors offer only a limited degree of mutation, which is frequently detectable via static code analysis. The advent of large language models (LLMs), such as ChatGPT 4.0 and Google Bard may lead to a significant evolution in this landscape. These models have demonstrated a level of algorithm comprehension and code synthesis capability that closely resembles human abilities. This advancement has sparked concerns among experts that such models could be exploited by threat actors to generate sophisticated metamorphic malware. This paper explores the potential of several prominent LLMs for software code mutation that may be used to reconstruct (with mutation) existing malware code bases or create new forms of embedded mutation engines for next-gen metamorphic malwares. In this work, we introduce a framework for creating self-testing program mutation engines based on LLM/Transformer-based models. The proposed framework serves as an essential tool in testing next-gen metamorphic malware detection engines.

Submitted: Oct 31, 2024