Paper ID: 2411.02403
A Persuasion-Based Prompt Learning Approach to Improve Smishing Detection through Data Augmentation
Ho Sung Shim, Hyoungjun Park, Kyuhan Lee, Jang-Sun Park, Seonhye Kang
Smishing, which aims to illicitly obtain personal information from unsuspecting victims, holds significance due to its negative impacts on our society. In prior studies, as a tool to counteract smishing, machine learning (ML) has been widely adopted, which filters and blocks smishing messages before they reach potential victims. However, a number of challenges remain in ML-based smishing detection, with the scarcity of annotated datasets being one major hurdle. Specifically, given the sensitive nature of smishing-related data, there is a lack of publicly accessible data that can be used for training and evaluating ML models. Additionally, the nuanced similarities between smishing messages and other types of social engineering attacks such as spam messages exacerbate the challenge of smishing classification with limited resources. To tackle this challenge, we introduce a novel data augmentation method utilizing a few-shot prompt learning approach. What sets our approach apart from extant methods is the use of the principles of persuasion, a psychology theory which explains the underlying mechanisms of smishing. By designing prompts grounded in the persuasion principles, our augmented dataset could effectively capture various, important aspects of smishing messages, enabling ML models to be effectively trained. Our evaluation within a real-world context demonstrates that our augmentation approach produces more diverse and higher-quality smishing data instances compared to other cutting-edging approaches, leading to substantial improvements in the ability of ML models to detect the subtle characteristics of smishing messages. Moreover, our additional analyses reveal that the performance improvement provided by our approach is more pronounced when used with ML models that have a larger number of parameters, demonstrating its effectiveness in training large-scale ML models.
Submitted: Oct 18, 2024