Paper ID: 2407.11105
Impacts of Data Preprocessing and Hyperparameter Optimization on the Performance of Machine Learning Models Applied to Intrusion Detection Systems
Mateus Guimarães Lima, Antony Carvalho, João Gabriel Álvares, Clayton Escouper das Chagas, Ronaldo Ribeiro Goldschmidt
In the context of cybersecurity of modern communications networks, Intrusion Detection Systems (IDS) have been continuously improved, many of them incorporating machine learning (ML) techniques to identify threats. Although there are researches focused on the study of these techniques applied to IDS, the state-of-the-art lacks works concentrated exclusively on the evaluation of the impacts of data pre-processing actions and the optimization of the values of the hyperparameters of the ML algorithms in the construction of the models of threat identification. This article aims to present a study that fills this research gap. For that, experiments were carried out with two data sets, comparing attack scenarios with variations of pre-processing techniques and optimization of hyperparameters. The results confirm that the proper application of these techniques, in general, makes the generated classification models more robust and greatly reduces the execution times of these models' training and testing processes.
Submitted: Jul 15, 2024