The significant increase in technology development over the internet makes network security a crucial issue. An intrusion detection system (IDS) shall be introduced to protect the networks from various attacks. Even with the increased amount of works in the IDS research, there is a lack of studies that analyze the available IDS datasets. Therefore, this study presents a comprehensive analysis of the relevance of the features in the KDD99 and UNSW-NB15 datasets. Three methods were employed: a rough-set theory (RST), a back-propagation neural network (BPNN), and a discrete variant of the cuttlefish algorithm (D-CFA). First, the dependency ratio between the features and the classes was calculated, using the RST. Second, each feature in the datasets became an input for the BPNN, to measure their ability for a classification task concerning each class. Third, a feature-selection process was carried out over multiple runs, to indicate the frequency of the selection of each feature. From the result, it indicated that some features in the KDD99 dataset could be used to achieve a classification accuracy above 84%. Moreover, a few features in both datasets were found to give a high contribution to increasing the classification’s performance. These features were present in a combination of features that resulted in high accuracy; the features were also frequently selected during the feature selection process. The findings of this study are anticipated to help the cybersecurity academics in creating a lightweight and accurate IDS model with a smaller number of features for the developing technologies.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited