An Analysis of the KDD99 and UNSW-NB15 Datasets for the Intrusion Detection System

Al-Daweri, Muataz Salam; Zainol Ariffin, Khairul Akram; Abdullah, Salwani; Md. Senan, Mohamad Firham Efendy

doi:10.3390/sym12101666

Open AccessArticle

An Analysis of the KDD99 and UNSW-NB15 Datasets for the Intrusion Detection System

by

Muataz Salam Al-Daweri

^1,*

,

Khairul Akram Zainol Ariffin

²,

Salwani Abdullah

¹ and

Mohamad Firham Efendy Md. Senan

³

¹

Centre for Artificial Intelligence Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Malaysia

²

Centre for Cyber Security, Universiti Kebangsaan Malaysia, Bangi 43600, Malaysia

³

Cybersecurity Malaysia, Level 7, Tower 1 Menara Cyber Axis Jalan Impact, Cyberjaya 63000, Malaysia

^*

Author to whom correspondence should be addressed.

Symmetry 2020, 12(10), 1666; https://doi.org/10.3390/sym12101666

Submission received: 4 September 2020 / Revised: 16 September 2020 / Accepted: 24 September 2020 / Published: 13 October 2020

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

:

The significant increase in technology development over the internet makes network security a crucial issue. An intrusion detection system (IDS) shall be introduced to protect the networks from various attacks. Even with the increased amount of works in the IDS research, there is a lack of studies that analyze the available IDS datasets. Therefore, this study presents a comprehensive analysis of the relevance of the features in the KDD99 and UNSW-NB15 datasets. Three methods were employed: a rough-set theory (RST), a back-propagation neural network (BPNN), and a discrete variant of the cuttlefish algorithm (D-CFA). First, the dependency ratio between the features and the classes was calculated, using the RST. Second, each feature in the datasets became an input for the BPNN, to measure their ability for a classification task concerning each class. Third, a feature-selection process was carried out over multiple runs, to indicate the frequency of the selection of each feature. From the result, it indicated that some features in the KDD99 dataset could be used to achieve a classification accuracy above 84%. Moreover, a few features in both datasets were found to give a high contribution to increasing the classification’s performance. These features were present in a combination of features that resulted in high accuracy; the features were also frequently selected during the feature selection process. The findings of this study are anticipated to help the cybersecurity academics in creating a lightweight and accurate IDS model with a smaller number of features for the developing technologies.

Keywords:

dataset analysis; features relevance; feature selections; neural network; classification; network security; metaheuristic algorithms; UNSW-NB15; KDD99

1. Introduction

Due to the increasing demand for computer networks and network technologies, the attack incidents are growing day by day, making the intrusion detection system (IDS) an essential tool to use for keeping the networks secure. It has been proven to be effective against many different attacks, such as the denial of service (DoS), structured query language (SQL) injection, and brute-force [1,2,3]. Two approaches are to be considered when developing an IDS [4]: misuse-based and anomaly-based. In the misuse-based approach, the IDS attempts to match the patterns of already known network attacks. Its database gets updated continuously by storing the patterns of known network attacks. The anomaly-based IDS, on the other hand, attempts to detect unknown network attacks by comparing them to the regular connection patterns. The anomaly-based IDSs are considered to be adaptive, and they are susceptible to generate a high number of false positives [4,5].

For developing an efficient IDS model, a large amount of data is required for training and testing. The quality of the data is very critical and influential, primarily on the results of the IDS model [6]. The low-quality and irrelevant information found in data can be eliminated after gathering the statistical properties from its observable attributes and elements [7]. However, the data could be insufficient, incomplete, imbalanced, high-dimensional, or abundant [6]. Therefore, providing an in-depth analysis of the available datasets is crucial for IDS research.

The KDD99 [8] and UNSW-NB15 [9,10] datasets are two well-known available IDS datasets. Many studies have used these datasets in their works [11,12,13,14,15,16,17,18,19,20,21]. Reference [11] introduced a new hybrid method for classification based on two algorithms, namely artificial fish swarm (AFS) and artificial bee colony (ABC). The hybrid method was tested by using the UNSW-NB15 and NSL-KDD datasets. Reference [12] proposed a wrapper approach that uses different decision-tree classifiers and was tested by using the KDD99 and UNSW-NB15 datasets. Reference [13] presented a hybrid C4.5 and modified K-means and evaluated it, using the KDD99. References [14,15] used the KDD99 to evaluate a hybrid classification method based on an extreme learning machine (ELM) and support vector machine (SVM). Reference [16] introduced a hybrid classification method that utilized the K-means and information gain ratio (IGR) and evaluated the method, using the KDD99 dataset. Reference [17] introduced a methodology of combining datasets (called MapReduce). In their work, they used the KDD99 and DARPA datasets to test the introduced combination method. Then, they analyzed the combined and cleaned dataset, using K2 and NaïveBayes techniques. Reference [18] used the UNSW-NB15 dataset to evaluate an SVM with a new scaling approach. Reference [19] gave a comprehensive study on applying the local clustering approach to solve the IDS problem. For evaluation, the KDD99 dataset was utilized. Reference [20] employed a multi-layer SVM and tested it by using the KDD99 dataset. Different samples were selected from the dataset, which was used to evaluate the performance of their proposed method. Reference [21] proposed a novel discrete metaheuristic algorithm, a discrete cuttlefish algorithm (D-CFA), to solve the feature selection problem. The D-CFA was tested, to reduce the features in the KDD99 dataset. The algorithm was introduced based on the color reflection and visibility mechanism of the cuttlefish. Few more variants of the algorithm were proposed in the literature [22,23]. However, the selected features by the D-CFA in Reference [21] were evaluated by a decision tree (DT) classifier. The study found that the classifier achieved a 91% detection rate and a 3.9% false-positive rate with only five selected features.

Furthermore, only a few studies have tried to analyze the KDD99 and UNSW-NB15 datasets [7,24,25,26,27,28,29,30]. Reference [24] used a clustering method and an integrated rule-based IDS to analyze the UNSW-NB15 dataset. Reference [25] analyzed the relation between the attacks in the UNSW-NB15 and their transport layer protocols (transmission control protocol and user datagram protocol). Reference [26] gave a case study on the KDD99 dataset. The study stated a lack of works in the IDS research that analyzes the currently available datasets. In Reference [27], the characteristics of the features in the KDD99 and UNSW-NB15 datasets were investigated for effectiveness measurement. An association rule mining algorithm and a few other existing classifiers were used for their experiments. The study claimed that UNSW-NB15 offers more efficient features than the KDD99 in detection accuracy and the number of false alarms. Reference [28] analyzed the KDD99 and proposed a new dataset, called NSL-KDD, an improved version of the KDD99. Reference [7] also gave an analysis of the KDD99. Besides, they analyzed other variants, namely the NSL-KDD and GureKDDcup datasets. The analysis in Reference [7] was aimed to improve the datasets by reducing the dimensions, completing missing values, and removing any redundant instances. The study found that KDD99 contains a high number of redundant instances. Reference [29] used a rough-set theory (RST) to measure the relationship between the features and each class in the KDD99. In the study, a few features were classified as not relevant for any of the dataset’s classes. Reference [30] gave an analysis of the feature relevance of the KDD99, using an information gain. The study concluded that a few features in the dataset do not contribute to the attack detection. It also concluded that the testing set of the dataset offers different characteristics than its training set.

Recently, Reference [31] surveyed the available datasets in the IDS research and gave a comprehensive overview of the properties of each dataset. The first property discussed in the study was general information, such as the year and type of classes. The second property was the data nature, covering the formatting and information about metadata, if existing in the dataset. The third property was the size and duration of the captured packets. The fourth property included the recording environment, which indicated the type of traffic and network’s services used for the dataset generation. Lastly, the evaluation part provided for the researchers, for example, the class balance and the predefined data split. However, Reference [31] recommended the researchers to produce a dataset that is focused on specific attack types rather than trying to cover all the possible attacks. If the dataset satisfies a specific application, then it is considered sufficient. In Reference [31], the comprehensive dataset was described to have correctly labeled classes available for everyone, include real-world network traffic and not synthetic, contain all kinds of attacks, and be always updated. It should also contain packets header information and the data payload, which needs to be captured over a long period. Based on the number of attacks provided in the available datasets, the UNSW-NB15 was one of their general recommendations for IDS testing.

Reference [32] reviewed a few of the IDS datasets, namely full KDD99, corrected, and ten percent variants of the KDD99, NSL-KDD, UNSW-NB15, center for applied internet data analysis dataset (CAIDA), australian defence force academy linux dataset (ADFA-LD), and university of new mexico dataset (UNM). The study in Reference [32] gave general information for each of the datasets, with more emphasis on UNSW-NB15. For comparison, the k-nearest neighbors (k-NN) classifier was implemented to report the accuracy, precision, and recall across all the reviewed datasets. The results showed that the classifier performed better when using the NSL-KDD. They claimed that the superior results from using the NSL-KDD were achieved because the dataset contains less redundant records, which are distributed fairly. Reference [33] analyzed the KDD99, NSL-KDD, and UNSW-NB15 datasets, using a deep neural network (DNN) on the internet of things (IoT). By applying a similar evaluation metric as in Reference [32] and F1 measure, the results show that DNN was able to achieve an accuracy above 90% for all datasets. Further, DNN had the best performance on UNSW-NB15. Reference [34] evaluated the features in the NSL-KDD and UNSW-NB15, using four filter-based feature-selection measures, namely correlation measure (CFS), consistency measure (CBF), information gain (IG), and distance measure (ReliefF). The selected features from those four methods were then evaluated by using four classifiers to indicate the training and testing performance, namely k-NN, random forests (RF), support vector machine (SVM), and deep belief network (DBN). The study reported the selected features for each feature selection method, in addition to the classification results, which were aimed to provide help for the researchers in the cybersecurity in designing affective IDS. Reference [35] analyzed the UNSW-NB15 dataset by finding the relevance of the features, using a neural network. The authors categorized the features into five groups, based on their type, such as flow-based, content-based, time-based, essential, and additional features. From these groups, 31 possible combinations of features were evaluated and discussed. The highest accuracy (93%) in Reference [35] was obtained by using 39 features from the categorized groups. Moreover, in the study, there was a combination of 23 features that were selected by using a meta estimator called SelectFromModel that selects features based on their scores. The 23 selected features resulted in higher accuracy (97%) than those 39 features mentioned above.

Reference [36] compared the features in the UNSW-NB15 dataset with a few feature vectors that were previously proposed in the literature. They were evaluated by using a supervised machine learning to indicate the computational times and classification performance. The results of the study suggested that the current vectors can be improved by reducing their size and adapting them to deal with encrypted traffic. Reference [37] proposed a feature-selection method based on the genetic algorithm (GA), grey wolf optimizer (GWO), particle swarm optimization (PSO), and firefly optimization (FFA). The UNSW-NB15 dataset was employed for the tests of the study. The selected features from using the proposed method were evaluated by using SVM and J48 classifiers. The study reported the classification performance of a few combinations of features from the UNSW-NB15 dataset. In Reference [38], a hierarchical IDS that uses machine-learning and knowledge-based approaches was introduced and tested, using the KDD99 dataset. Reference [39] proposed an ensemble model based on the J48, RF, and Reptree and evaluated it by using the KDD99 and NSL-KDD datasets. A correlation-based approach was implemented, to reduce the features from the datasets. Reference [40] examined the reliability of a few machine learning models, such as the RF and gradient-boosting machines in real-world IoT settings. In order to do the examination, data-poisoning attacks were simulated by using a stochastic function to modify the training data of the datasets. The UNSW-NB15 and ToN_IoT datasets were employed for the experiments of the study.

It is essential to address that the KDD99 and UNSW-NB15 datasets do not contain attacks related to the cloud computing, such as the SQL injection. Reference [41] proposed a countermeasure to detect these attacks, specifically in the cloud environment. The method in Reference [41] can be applied to the cloud environment, without the need for an application’s source code.

In this study, the features in the KDD99 and UNSW-NB15 datasets were analyzed by using a rough-set theory (RST), a back-propagation neural network (BPNN), and a discrete variant of the cuttlefish algorithm (D-CFA). The analysis provides an in-depth examination of the relevance of each feature to the malicious-attack classes. It also studies the symmetry of the records distribution among the classes. The results of the analysis suggest a few features and combinations that can be used for creating an accurate IDS model. This study also describes and gives the properties of the datasets mentioned above. Despite the availability of other works that have tried to analyze the two datasets, it is important to study the most common datasets in this domain continuously, not only to confirm their relevance but also to expand the findings on these datasets. However, the main contributions of this paper can be listed as follows:

Give a detailed description of the KDD99 and UNSW-NB15 datasets.
Point out the similarities between the two datasets.
Indicate if the KDD99 is still relevant for the IDS domain.
List the relevant features for increasing the classification performance.
Provide the statistical and properties of each feature concerning the classes.
Indicate the effect of the features in both datasets on the behavior of the neural networks.

This paper includes five sections. The description and properties of the KDD99 and UNSW-NB15 datasets are provided in Section 2. Section 3 explains the methodology and experimental setup. The results and discussions are given in Section 4. Conclusion and future work are provided in Section 5.

2. Datasets’ Description and Properties

The KDD99 is very common between researchers in the IDS research. A survey by Reference [42] found that 142 studies have used the KDD99 dataset form year 2010 until 2015. The dataset is available with 41 features (excluding the labels) and five classes, namely Normal, DoS), Probe, remote-to-local (R2L), and user-to-root (U2R). The KDD99 (ten percent variant) contains 494,021 and 311,029 records in the training and testing sets. The classes in the training and testing sets of the KDD99 are imbalanced, as shown in Figure 1. The DoS class has the highest number of records, while the Normal class comes in second. Moreover, the testing set contains a higher amount of records that are classified as R2L. This distribution of records was found to contain a large amount of duplicated records. The number of records of each class with their amount of duplications is provided in Table 1.

A graphical representation of the amount of records duplications for each class is given in Figure 2. The highest amount of duplications in the training set belongs to DoS and Probe classes, whereas the highest amount of duplications in the testing set belongs to DoS and R2L. The Probe class in the testing set also contains a fair amount of duplications. It is essential to address that the U2R class contains no duplications in the training set. However, the full training and testing sets of the KDD99 dataset contain duplicated records of 348,437 (70.53%) and 233,813 (75.17%), respectively. Five percent more duplications were present in the testing set.

The available UNSW-NB15 dataset contains 42 features (excluding the labels) and ten classes, namely Normal, Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode, and Worms. Its training set includes 175,341 records, while the testing set has 82,332 records. The classes in the training and testing sets of the UNSW-NB15 are also imbalanced, as illustrated in Figure 3. Normal class in both sets contains the highest amount of records. In contrast, Generic and Exploits come in second. Fuzzers class includes a fair amount of records, as well, but the rest of the classes show a low amount of records compared to the mentioned classes. However, it was found that the training set of the UNSW-NB15 contains a high number of duplicated records, whereas the testing set does not contain any. Based on the details given in Table 2, the full training set shows that it contains 42.24% duplicated records. Figure 4 illustrates the duplications for each class in the training set. These duplications are found mainly in the Generic, DoS, and Exploits classes. Reconnaissance class also contains a fair amount of duplications.

The class distribution difference between the two datasets is shown in Figure 5. The KDD99 has a higher amount of records that represent a malicious attack class. Both training and testing sets of the KDD99 have an almost identical percentage of attack and normal records. As for the UNSW-NB15, the records distributions between the attack and normal classes are more balanced than those in the KDD99. Moreover, the percentage of the attacks and normal classes across both sets are slightly different.

The names of the features in each dataset are given in Table 3. The features in the KDD99 dataset are categorized into four groups. They are given in Table 4. The first group (basic) contains nine features that include necessary information, such as the protocol, service, and duration. The second group (content) represents thirteen features, containing information about the content, such as the login activities. The third group (time) provides nine time-based features, such as the number of connections that are related to the same host within two seconds period. The fourth (host) contains ten host-based features, which provide information about the connection to the host, such as the rate of connections that have the same destination port number trying to be accessed by different hosts.

As for the features in the UNSW-NB15 dataset, they are categorized into five groups and provided in Table 5. The first group (flow) includes the protocol feature, which identifies the protocols between the hosts, such as a TCP or UDP. The second group (basic) represents the necessary connection information, such as the duration and number of packets between the hosts. Fourteen features are categorized in this group. The third group (content) provides content information from the TCP, such as the window advertisement values and base sequence numbers. It also provides some information about the HTTP connections, such as the data size transferred using the HTTP service. Eight features are present in this group. The fourth group (time) includes eight features that use time, such as the jitter and arrival time of the packets. The fifth group (additional) includes eleven additional features, such as if a login was successfully made. Moreover, the fifth group includes a few features that calculate the number of rows that use a specific service from a flow of 100 records based on a sequential order. It is important to address that a few described features in Reference [9], namely srcip, sport, dstip, dsport, stime, and ltime, were not present in the actual dataset; therefore, they were not included in this study. Moreover, f_2-9 was present in the dataset but was not described or categorized in Reference [9]; therefore, it was categorized in the basic group.

A few features were found to be in common between the two datasets. KDD99′s features f_1-1, f_1-2, f_1-3, f_1-5, and f_1-6 are in common with UNSW-NB15′s features f_2-1, f_2-2, f_2-3, f_2-7, and f_2-8. f_1-1 and f_2-1 describe the connection duration; f_1-2 and f_2-2 give the protocol type, such as transmission control protocol (TCP) or user datagram protocol (UDP); f_1-3 and f_2-3 state the service used at the destination, such as file transfer protocol (FTP) or domain name system (DNS); and f_1-5, f_2-7, f_1-6, and f_2-8 give the number of transmitted data bytes between the source and destination. There are some other features in between the two datasets that share similar characteristics. As described in Table 6, both datasets contain features that use connection flags. Connection flags provide additional information, such as synchronization (SYN) and acknowledgment (ACK). There were ten features in the KDD99 that use flags, whereas, in the UNSW-NB15, there were only four features. In Table 6, it can also be seen that the number of features that involve connection count is higher in the KDD99 than UNSW-NB15. Further, the UNSW-NB15 was found to contain more features that are time-based and size-based.

3. Methodology

The dataset analysis was done by using three methods, namely RST, back-propagation neural network (BPNN), and D-CFA. First, using the RST, the dependency between the features and each of the attack classes was calculated. Second, using the BPNN, the classification accuracy (

A C C

) for each feature to detect a malicious attack class was computed. Lastly, the D-CFA was used for feature selection to select the most relevant features over multiple iterations and runs to indicate the most frequently selected features. The BPNN was recruited to evaluate those selected features as a wrapper feature selection approach. However, to calculate the

A C C

from the BPNN, the records in the datasets were first transformed and normalized. Figure 6 illustrates the main steps that were taken to analyze the KDD99 and UNSW-NB15 datasets.

The three used methods for the analysis and their evaluation measurements are explained in detail in the following subsections.

3.1. Rough-Set Theory (RST)

The RST was used to find the dependency between the features and the classes. For this analysis, each feature was used to calculate its dependency on each of the malicious-attack classes. Based on References [29,43], the dependency ratio (called

d e p R a t i o

) was calculated, using Equation (1).

d e p R a t i o (X) = | \frac{l o w e r (X)}{U} |

(1)

where

U

denotes all the records, and

X

signifies the cardinality of records that are used to classify two classes—normal and attack. The

d e p R a t i o

is a value between 0 and 1. If the

d e p R a t i o

= 1, then

X

is a crisp set and can classify the two classes correctly, and if the

d e p R a t i o

< 1, then

X

is a rough set with a

d e p R a t i o

value less than 1. It is defined based on the lower approximation (called

l o w e r

), which is calculated by using Equation (2).

l o w e r (X) = {r | {[r]}_{f} \in X}

(2)

where

l o w e r (X)

is the set of records that only belong to the target decision (

X

), which can be used to classify the decision without any uncertainty. It is the union of all the records for both classes in

{[r]}_{f}

, which are entirely contained by the selected feature (

f

). However, once the

d e p R a t i o (X)

of the features is calculated, then, an average of that

d e p R a t i o

(called

A D R

) can be computed, using Equation (3).

A D R = \frac{\sum_{i = 1}^{n} d e p R a t i o (X)}{n}

(3)

where

n

is the number of features in the dataset. The

A D R

can be used to indicate the dependency of all the features to a specific attack class where a higher

A D R

value designates a higher dependency.

3.2. Back-Propagation Neural Network (BPNN)

The used BPNN was based on its implementation provided in Reference [44]. The back-propagation is the training algorithm for adjusting the weights and biases of the neural network [44].

Formally, as illustrated in Figure 7, every input,

i n p_{i}

, with weight,

w_{i}

, corresponds to the power of the connection. The sum of the weights and the bias,

b,

donate to the activation function,

σ

, to generate the output,

y

[45]. This process can be demonstrated by using Equation (4).

y = σ (\sum_{i = 1}^{n} i n p_{i} w_{i} + b)

(4)

In order to keep the structure of the neural network simple, only one layer was set at the hidden layer. As for nodes in the hidden layer, a different number of nodes were set and evaluated with a maximum number equal to

n

. The logistic sigmoid function was used as an activation function at the hidden and output layers, using Equation (5).

σ (v_{i}) = \frac{1}{1 + e^{- v_{i}}}

(5)

where

e

is exponential, and

v_{i}

represents the input value of the function.

Mean square error (MSE) was used to calculate the error loss during the training of the neural network, using Equation (6).

M S E = (\frac{1}{R_{n}}) * \sum_{i = 1}^{R_{n}} o u t - d e s i r e d

(6)

where

R_{n}

is the records number from the training set,

o u t

is the output of the function, and

d e s i r e d

is the expected output value.

The final weights and biases are obtained by reducing the output of the error loss function. The training procedure ends after the maximum number of epochs is reached. However, the same parameters as in Reference [44] were used to train the BPNN. They are given in Table 7.

Furthermore, for data preprocessing (transformation and normalization), the non-numeric values were transformed and then normalized, using a min–max function, using Equation (7).

n o r m a l i z e d = \frac{i n p u t - m i n i m u m}{m a x i m u m - m i n i m u m}

(7)

where

n o r m a l i z e d

denotes the normalized value, and

m i n i m u m

and

m a x i m u m

refer to the smallest and highest values of that input.

In order to report the

A C C

, every single feature in the datasets was used as an input to train a BPNN model. The

A C C

and average

A C C

(

A A C C

) that are resulted from the training can be calculated by using Equations (8) and (9), respectively.

A C C = \frac{T P + T N}{T P + T N + F P + F N}

(8)

A A C C = \sum_{i = 1}^{n} A C C_{i}

(9)

where

T P

and

T N

signify the classification was correct;

F P

and

F N

indicate the output was classified incorrectly; and

T P

,

T N

,

F P

, and

F N

are calculated based on all the outputs

y

of each BPNN model.

3.3. Discrete Cuttlefish Algorithm (D-CFA)

The standard cuttlefish algorithm (CFA) [46] and its discrete variant (D-CFA) [21] have four search strategies that include two exploration (global) strategies and two exploitation (local) strategies, which are based on the skin-color changing of the cuttlefish. In the D-CFA, a new solution,

S o l_{n e w}

, is generated based on

R e f l e c t i o n

and

V i s u a l i t y

, using Equation (10).

S o l_{n e w} = R e f l e c t i o n \cup V i s u a l i t y

(10)

where ∪ is the union of the produced discrete data (features). Algorithm 1 gives the pseudo-code of the D-CFA for solving the feature selection issues. The BPNN was used to evaluate the picked features, and the classification accuracy was used as a fitness function during the search process. The flowchart of the process of the D-CFA is given in Figure 8.

Each solution,

S o l_{i}

, in the population (called

D p o p

) includes two subsets:

p i c k e d_f t r

; and

u n p i c k e d_f t r

. The final selected features are assigned to

p i c k e d_f t r

, whereas the final unselected features are assigned to

u n p i c k e d_f t r

. No repetition of features is in between the subsets, where

p i c k e d_f t r \cap u n p i c k e d_f t r

= none. To illustrate, consider there was a total of 20 features in the dataset and

p i c k e d_f t r

is 5; if so, then

u n p i c k e d_f t r

will be equal to 15 features.

3.3.1. Initialization Phase (Lines 1–4 of Algorithm 1)

During the initialization phase, the solutions in

D p o p

are initialized with a random number of features. The best solution

S o l_{b e s t}

is kept in order to be used in one of the search strategies. The maximum number of iterations (called

M a x I t e r

) is initialized during this phase.

3.3.2. Improvement Phase (Lines 5–29 of Algorithm 1)

The improvement phase of the algorithm uses four search strategies, which are explained in the following subsections:

Global search 1 (lines 8–12 of Algorithm 1)

The first global search of the algorithm finds a new solution,

S o l_{n e w}

, using Equation (10), where the required values of the

R e f l e c t i o n

and

V i s u a l i t y

are calculated by using Equations (11) and (12), respectively.

R e f l e c t i o n = s u b s e t_r a n d o m [R_{°}] \subset S o l_{i} . p i c k e d_f t r

(11)

V i s u a l i t y = s u b s e t_r a n d o m [V_{°}] \subset S o l_{i} . u n p i c k e d_f t r

(12)

where

R e f l e c t i o n

and

V i s u a l i t y

are the subsets of features with a size equal to the values of

R_{°}

and

V_{°}

, to specify the number of the features to be picked from

S o l_{i}

’s

p i c k e d_f t r

and

u n p i c k e d_f t r

. Equations (13) and (14) are used to compute the values of

R_{°}

and

V_{°}

, respectively.

R_{°} = r a n d o m (z e r o, p i c k e d_f t r . s i z e)

(13)

V_{°} = p i c k e d_f t r . s i z e - R_{°}

(14)

where

r a n d o m (z e r o, p i c k e d_f t r . s i z e)

is a number that is randomly generated between zero and number of picked features in the

p i c k e d_f t r

subset. However, the union of the subsets that are generated from the

R e f l e c t i o n

and

V i s u a l i t y

is used to create a new subset for the new solution,

S o l_{n e w}

. All unpicked features are placed in the

u n p i c k e d_f t r

subset of the

S o l_{n e w}

.

Local search 1 (lines 13–17 of Algorithm 1)

The first local search in the algorithm finds a new solution,

S o l_{n e w}

, using Equation (10), based on

S o l_{b e s t}

. The

p i c k e d_f t r

and

u n p i c k e d_f t r

subsets of the are computed by using Equations (15) and (16), respectively.

R e f l e c t i o n = S o l_{b e s t} . p i c k e d_f t r - S o l_{b e s t} . p i c k e d_f t r [R_{°}]

(15)

V i s u a l i t y = S o l_{b e s t} . u n p i c k e d_f t r [V_{°}]

(16)

where

R_{°}

is computed by using Equation (17), which is then used to specify the feature index for replacement from

p i c k e d_f t r

.

V

is computed by using Equation (18), to specify the feature replacement from

u n p i c k e d_f t r

subset of

S o l_{b e s t}

.

R_{°} = r a n d o m (z e r o, B S o l_{b e s t} . p i c k e d_f t r . s i z e)

(17)

V_{°} = r a n d o m (z e r o, S o l_{b e s t} . u n p i c k e d_f t r . s i z e)

(18)

where

S o l_{b e s t} . u n p i c k e d_f t r . s i z e

is equal to the number of features in the

u n p i c k e d_f t r

subset of

S o l_{b e s t}

.

Local search 2 (lines 18–22 of Algorithm 1)

The second local search calculates an average of based on

S o l_{b e s t}

, to generate an average solution (called

S o l_{A v g}

), similar to two subsets (

p i c k e d_f t r

and

u n p i c k e d_f t r

). Then a new solution,

S o l_{n e w}

, is computed based on the subsets of

S o l_{A v g}

, using Equations (13)–(15).

S o l_{A v g}

always contains one feature less than those in the

p i c k e d_f t r

of

S o l_{b e s t}

. For each generation, one feature from the

p i c k e d_f t r

subset is removed and moved to the

u n p i c k e d_f t r

subset, to create the

S o l_{n e w}

and update the

S o l_{A v g}

.

S o l_{n e w} = R e f l e c t i o n - V i s u a l i t y

(19)

R e f l e c t i o n = S o l_{A v g} . p i c k e d_f t r

(20)

V i s u a l i t y = S o l_{A v g} . p i c k e d_f t r [i]

(21)

where

i

refers to the index of the feature for removal:

i

= {1,2,3,…,

S o l_{A v g} . p i c k e d_f t r . s i z e

).

Global search 2 (lines 23–27 of Algorithm 1)

In the second global search, a new solution,

S o l_{n e w}

, is generated with random subsets of features, a similar process as in the population initialization.

Algorithm 1 D-CFA
1:	Initialization Phase:
2:	Initialize the solutions in $D p o p$ at random subsets of features
3:	Evaluate each $S o l_{i}$ in the $D p o p$ using a BPNN and store the best solution in $S o l_{b e s t}$
4:	Set the value of $M a x I t e r$ parameter
5:	Improvement Phase:
6:	While (Iterations < $M a x I t e r$ ) Do
7:	For each $S o l_{i}$ in the $D p o p$
8:	Global search 1
9:	Update $p i c k e d_f t r$ and $u n p i c k e d_f t r$ subsets for $S o l_{n e w}$ using Equations (10)–(14)
10:	Evaluate the $S o l_{n e w}$ using BPNN
11:	If $f (S o l_{n e w}) > f (S o l_{b e s t})$ then $S o l_{b e s t} = D x_{n e w}$
12:	If $f (S o l_{n e w}) > f (S o l_{i})$ then $S o l_{i} = S o l_{n e w}$
13:	Local search 1
14:	Update $p i c k e d_f t r$ and $u n p i c k e d_f t r$ subsets for $S o l_{n e w}$ using Equations (10), (15)–(18)
15:	Evaluate the $S o l_{n e w}$ using BPNN
16:	If $f (S o l_{n e w}) > f (S o l_{b e s t})$ then $S o l_{b e s t} = D x_{n e w}$
17:	If $f (S o l_{n e w}) > f (S o l_{i})$ then $S o l_{i} = S o l_{n e w}$
18:	Local search 2
19:	$S o l_{A v g} = S o l_{b e s t}$
20:	Update $p i c k e d_f t r$ and $u n p i c k e d_f t r$ subsets for $S o l_{n e w}$ using Equations (19)–(21)
21:	Evaluate the $S o l_{n e w}$ using BPNN
22:	If $f (S o l_{n e w}) > f (S o l_{b e s t})$ then $S o l_{b e s t} = D x_{n e w}$
23:	Global search 2
24:	Generate random $p i c k e d_f t r$ and $u n p i c k e d_f t r$ subsets for the $S o l_{n e w}$
25:	Evaluate the $S o l_{n e w}$ using BPNN
26:	If $f (S o l_{n e w}) > f (S o l_{b e s t})$ then $S o l_{b e s t} = D x_{n e w}$
27:	If $f (S o l_{n e w}) > f (S o l_{i})$ then $S o l_{i} = S o l_{n e w}$
28:	End for
29:	End while
30:	Return $S o l_{b e s t}$

4. Results and Discussions

In this section, three experiments were carried out, to analyze the training sets of the KDD99 and UNSW-NB15 datasets. First, the

l o w e r

and

d e p R a t i o

between each feature and attack class were calculated. Second, the

A C C

of the features for detecting malicious attack classes in the datasets were computed, using the BPNN. Lastly, the D-CFA was used for feature selection, to find the most frequently selected features. This section also discusses and compares all the obtained results from the experiments.

C# (C-Sharp) programming language was used for the experiments, and it was executed on a desktop computer with a specification of 2.8GHZ CPU (i5-8400) and 8GB RAM.

4.1. Calculating the Lower Approximations and Dependencies of the Features

The

A D R

of the features in the KDD99 and UNSW-NB15, respectively, can be seen in Figure 9 and Figure 10. Figure 9 shows that the features in the KDD99 had their highest

A D R

values for the U2R and R2L attacks, and their lowest values were for the DoS and all attacks combined. Specifically, feature f_1-5 showed the highest

A D R

across all attacks, and f_1-6 was found to be the second. Moreover, in the results for the UNSW-NB15, shown in Figure 10, the highest

A D R

values were for the Shellcode and Worms attacks, and their lowest was for Generic and all attacks combined. In specific, the highest

A D R

across all attacks was achieved by using f_2-1, and f_2-13 achieved the second highest. It is crucial to address that f_2-1 and f_2-13 are continuous values, and discretizing them might influence the reported results.

The

l o w e r

and

d e p R a t i o

of the features for each attack in the KDD99 and UNSW-NB15 are given in Appendix A Table A1 and Table A2, respectively. As shown in Appendix A Table A1, f_1-5 had the highest

l o w e r

and

d e p R a t i o

values for the Probe and R2L. As for the DoS and all the attacks combined, f_1-24 had the highest values. The U2R showed the highest values when using f_1-33. It is essential to address that f_1-12, f_1-20, and f_1-21 resulted in

l o w e r

and

d e p R a t i o

values of zero. Moreover, f_1-12 is a binary value that is used to indicate if a login was made; f_1-21 is related to the user’s logins, which is used to indicate if it was associated with a “hot” list, as referred in Reference [8]; and f_1-20 is used for indicating the commands of the outbound FTP connections. However, it was found that f_1-20 and f_1-21 have zero values in all records, and removing them is suggested for any classification task.

Based on the results given in Appendix A Table A2, f_1-1 showed the highest

l o w e r

and

d e p R a t i o

values for Fuzzers, DoS, Exploits, Reconnaissance, Shellcode, and all attacks combined. As for the Backdoors and Worms attacks, f_2-7 showed its highest values. Unlike f_1-20 and f_1-21 in the KDD99, none of the features in the UNSW-NB15 resulted in a

l o w e r

and

d e p R a t i o

values of zero. The lowest

d e p R a t i o

was achieved by f_2-23, and f_2-23 is used to give the value of the TCP window advertisement from the destination connection. Most of the values of f_2-23 in the dataset were found to be equal to 255 or zero.

4.2. Classification Accuracy Analysis: Examining the Features for the Detection of Each Attack

The neural networks behave differently based on the number of inputs and hidden nodes in the structure. As described in Reference [47], the number of hidden layer nodes can be set to a value that ranges between the number of inputs and outputs. Therefore, 41 and 42 simulations to train the BPNN were carried out for the KDD99 and UNSW-NB15 datasets. For example, to report the results of this experiment for analyzing the features’ ability to classify the attack class in the KDD99, the total number of simulations is equal to (number of features * two) = (41 * 2) = 82. Figure 11 and Figure 12 illustrate the

A A C C

of all the features in the KDD99 and UNSW-NB15, respectively.

It can be seen in Figure 11 that the

A A C C

for the KDD99 features was higher with several hidden nodes that range between 4 and 25, and beyond that range, it almost plateaued. Whereas the

A A C C

for the UNSW-NB15′s features, as shown in Figure 12, has illustrated an improvement with a number of nodes that exceeds 7. However, the best

A C C

for each feature in the KDD99 and UNSW-NB15 are shown in Figure 13 and Figure 14. Figure 13 shows that f_1-23 had the highest accuracy, while f_1-2, f_1-4, f_1-24, f_1-25, f_1-26, f_1-29, f_1-38, and f_1-39 had a noticeable difference when compared to other features. f_1-23 in isolate resulted in a best

A C C

of 98.32%, then, f_1-2 and f_1-24 come in second with a best

A C C

of 84.92% and 85.33%, respectively. As for the features in UNSW-NB15, as shown in Figure 14, the best

A C C

was reported using f_2-16 and f_1-42 with an

A C C

of 52.32% and 52.47%, respectively. The

A A C C

of the features in UNSW-NB15 was reported at 50.11%. However, these results indicate that the BPNN was able to train with a higher accuracy using the features in KDD99 than those in the UNSW-NB15.

4.3. The Most Frequently Selected Features Using the D-CFA

In this work, the D-CFA was used for feature selection over multiple runs, to pick different subsets of features. Those features are picked based on the highest achieved classification accuracy from a BPNN training. The parameters that were involved in the training of the BPNN are provided in Table 7. However, to find the most relevant features in the KDD99 and UNSW-NB15 datasets, two measurement approaches were considered. First, the D-CFA was applied to find the most relevant features for each attack in both datasets. Second, the D-CFA was simulated twenty times for each dataset, to find the most frequently picked features over those runs. Since f_1-20 and f_1-21 contain a value of zero in all the records, they were not used for both measurement approaches. As for the D-CFA’s parameters,

M a x I t e r

and

D p o p

were set to a value of 10.

Since the classes are not balanced in both datasets (see Figure 1 and Figure 3) and the first measurement approach examines the relevancy of features to each attack, the records have been modified. The modification of the number of records was done manually, where the datasets were split into multiple subsets. Each of these subsets includes one attack and an equal number of records from the normal class. It was done due to the lack of records in training set for specific classes, such as the R2L and U2R in KDD99. For example, the subset that was used to select features for the Probe attack in the KDD99 contains 8214 records, of which 4107 records belong to the attack class, and the rest are for the normal class. After simulating the experiment for the first measurement approach, results were concluded and are given in Table 8. The number of nodes in the hidden layer was considered, and multiple runs were carried out, to find a proper number of nodes to achieve the highest

A C C

possible.

Table 8 reports the selected features, the number of hidden layer nodes (labeled no. of nodes), and

A C C

for each attack in both datasets. Even though the number of features is less in the KDD99, the first attack class in the KDD99 (DoS) had the highest number of features. The

A C C

of detecting that attack was also the highest (99.40%). There were only twelve features for the DoS attack class in the UNSW-NB15, whereas the

A C C

was reported at 86.57%. It can be observed from Table 8 that the number of selected features for the KDD99 attack classes is less than that in the UNSW-NB15. An average of 25.2 features were selected for the attacks in the KDD99, whereas there was an average of 19.1 selected features for the attacks in the UNSW-NB15. It is essential to address that the lowest number of selected features was for the Generic attack in the UNSW-NB15, which was ten features. The second-lowest number of selected features were for the Fuzzers, Reconnaissance, and Shellcode, which had 18 features to obtain an

A C C

of 90.40%, 89.85%, and 90.75%, respectively. Furthermore, the results in Table 8 also have indicated that the KDD99 offers 2.97% higher

A A C C

than the UNSW-NB15.

The experiment for the second measurement approach was conducted by using the full training sets of the KDD99 and UNSW-NB15. The selected features from this experiment were evaluated by using the BPNN with the parameters given in Table 7. As for the structure of the neural network, only one hidden node was used to keep its implementation simple. The fitness of the updated solutions from the D-CFA is based on the

A C C

after each evaluation. The D-CFA aims to increase the

A C C

regardless of the number of selected features. However, after twenty simulations for each dataset, results were concluded and given in Table 9 and Table 10. Based on the output of these runs, the frequency of a feature being selected was measured. Table 9 gives the selection frequency of each feature in the KDD99, as well as its ranking when compared to the others. The ranks were calculated based on the number of times a feature is selected. Furthermore, the resulted

A C C

from training the BPNN was also provided in Table 9. It can be observed that f_1-23 had the best rank, which was selected nineteen times; f_1-29 was selected sixteen times and had the second rank. These two features belong to the time group (see Table 4). As for the third rank, f_1-1 had fourteen selections during the twenty runs; f_1-1 belongs to the basic group (see Table 4). In terms of

A C C

, run numbers nine, nineteen, and twenty resulted in the highest

A C C

.

The frequency of a feature being selected in the UNSW-NB15 dataset is given in Table 10. It can be observed from Table 10 that f_2-10 had the best rank, which was selected at every run. In the second rank, f_2-29 was selected thirteen times out of all runs. As for the third rank, f_2-11 had a frequency of selection of twelve times; f_2-20 was not selected for any of the runs, which represents the window advertisement value for the TCP connection of the source. Besides, the base sequence number for the TCP connection of the source (f_2-21) was selected only three times. These two features had the lowest rank when compared to the other features. It is important to stress that the highest

A C C

was obtained by run numbers nine, twelve, and seventeen. The commonly selected features between these three runs are f_2-10, f_2-11, and f_2-29, which are the top three ranked features from all the runs. These features belong to the basic and content groups (see Table 5).

The following can be concluded from the analysis done in this study:

The KDD99 dataset contains more duplicated records than the UNSW-NB15 dataset.
The UNSW-NB15′s testing set does not contain any duplication, whereas the training set does.
Both datasets have imbalance classes, and their normal-to-attack class ratio is not balanced.
In terms of the normal-to-attack class ratio, the UNSW-NB15 dataset is slightly more balanced.
There are five standard features between the datasets (see Table 6).
There is a feature in the UNSW-NB15 dataset (f_2-9) that is not described by the original creators of the UNSW-NB15 [9,27].
The KDD99 dataset has 22 features that share similar characteristics to those in the UNSW-NB15 dataset.
f_1-20 and f_{1- 21} in the KDD99′s training set have a value of zero in all the records, and removing them before training a model is suggested.
f_1-23 in the KDD99 can be used to train a model with an $A C C$ of 98.32%.
The features in the KDD99 dataset are able to train a classifier with a higher $A C C$ than those in the UNSW-NB15 dataset.
The features in the UNSW-NB15 dataset show a higher $d e p R a t i o$ and $A D R$ than the KDD99 dataset.
For training a neural network when using any of the analyzed datasets, it is suggested to use a minimum of three nodes in the structure of the hidden layer, to increase the performance of the training.
On average, more features were selected from the KDD99 than the UNSW-NB15 during a feature selection process for the classification task.
It is always suggested to employ f_1-1, f_1-23, and f_1-29 in the KDD99 and f_2-10, f_2-11, and f_2-29 in the UNSW-NB15 for any classification task, as they show their involvement to achieve high $A C C$ .
f_2-20 in the UNSW-NB15 was not selected during the feature selection process, indicating the irrelevance of the feature for the classification task.
The most selected features from the KDD99 belong to the basic and time groups (see Table 4), whereas the most selected features from the UNSW-NB15 belong to the basic and content groups (see Table 5). The basic group in both datasets contains four common features out of nine in the KDD99 and fourteen in the UNSW-NB15.
There are many similarities between the features in the KDD99 and UNSW-NB15. The similarities indicate that the KDD99 is still relevant for the IDS domain, even though it is over a twenty-year-old dataset.
Many of the features in both datasets are extracted from the header of the packets. This extraction can be a simple task, given the available tools, such as the TShark [48]. TShark can be used to select specific fields from the header of the packets. Then, the required features can be extracted. This process can be utilized in the development of emerging technologies, such as the IoT and real-time systems.

5. Conclusions and Future Work

An analysis of the KDD99 and UNSW-NB15 datasets was performed by using a rough-set theory (RST), a back-propagation neural network (BPNN), and a discrete variant of the cuttlefish algorithm (D-CFA). It was conducted to measure the relevance of the features in both datasets. The properties of each dataset were also investigated. The analysis suggested a few combinations of relevant features to detect each of the malicious attacks in both datasets. The conclusions from this study’s analysis are expected to aid the cybersecurity academics in developing an IDS model that is accurate and lightweight. For future work, we create a new dataset and an adaptive IDS method for real-world network traffic data.

Author Contributions

Conceptualization, M.S.A.-D., K.A.Z.A., and S.A.; methodology, M.S.A.-D.; software, M.S.A.-D.; validation, M.S.A.-D., K.A.Z.A., and S.A.; formal analysis, M.S.A.-D.; investigation, M.S.A.-D., K.A.Z.A., and S.A.; resources, M.S.A.-D., K.A.Z.A., S.A., and M.F.E.M.S.; data curation, M.S.A.-D., K.A.Z.A., and S.A.; writing—original draft preparation, M.S.A.-D.; writing—review and editing, M.S.A.-D., K.A.Z.A., S.A., and M.F.E.M.S.; visualization, M.S.A.-D.; supervision, K.A.Z.A. and S.A.; project administration, K.A.Z.A. and S.A.; funding acquisition, K.A.Z.A. and S.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Universiti Kebangsaan Malaysia, grant numbers GUP-2020-062 and DIP-2016-024.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Lower approximation (

L o w e r)

and dependency ratio (

d e p R a t i o)

of each feature in the KDD99 dataset.

Table A1. Lower approximation (

L o w e r)

and dependency ratio (

d e p R a t i o)

of each feature in the KDD99 dataset.

Feature	All Attacks		DoS		Probe		R2L		U2R
Feature	$L o w e r$	$D e p R a t i o$	$L o w e r$	$D e p R a t i o$	$L o w e r$	$D e p R a t i o$	$L o w e r$	$D e p R a t i o$	$L o w e r$	$D e p R a t i o$
f_1-1	5.3 × 10⁺³	1.0 × 10⁻²	6.4 × 10⁺³	1.3 × 10⁻²	5.7 × 10⁺³	5.6 × 10⁻²	6.2 × 10⁺³	6.3 × 10⁻²	1.1 × 10⁺⁴	1.1 × 10⁻¹
f_1-2	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	2.0 × 10⁺⁴	2.0 × 10⁻¹	1.2 × 10⁺³	1.3 × 10⁻²
f_1-3	4.6 × 10⁺³	9.4 × 10⁻³	1.1 × 10⁺⁴	2.3 × 10⁻²	6.7 × 10⁺²	6.7 × 10⁻³	2.5 × 10⁺⁴	2.5 × 10⁻¹	8.7 × 10⁺⁴	8.9 × 10⁻¹
f_1-4	1.1 × 10⁺²	2.3 × 10⁻⁴	8.0 × 10⁺⁰	1.6 × 10⁻⁵	1.7 × 10⁺²	1.7 × 10⁻³	5.3 × 10⁺³	5.4 × 10⁻²	5.5 × 10⁺³	5.6 × 10⁻²
f_1-5	8.4 × 10⁺⁴	1.7 × 10⁻¹	9.3 × 10⁺⁴	1.9 × 10⁻¹	9.0 × 10⁺⁴	8.8 × 10⁻¹	8.6 × 10⁺⁴	8.8 × 10⁻¹	8.8 × 10⁺⁴	9.1 × 10⁻¹
f_1-6	8.4 × 10⁺⁴	1.7 × 10⁻¹	8.5 × 10⁺⁴	1.7 × 10⁻¹	8.2 × 10⁺⁴	8.1 × 10⁻¹	8.2 × 10⁺⁴	8.4 × 10⁻¹	8.2 × 10⁺⁴	8.5 × 10⁻¹
f_1-7	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	1.0 × 10⁺⁰	9.8 × 10⁻⁶	1.0 × 10⁺⁰	1.0 × 10⁻⁵	1.0 × 10⁺⁰	1.0 × 10⁻⁵
f_1-8	1.2 × 10⁺³	2.5 × 10⁻³	1.2 × 10⁺³	2.5 × 10⁻³	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰
f_1-9	4.0 × 10⁺⁰	8.1 × 10⁻⁶	1.0 × 10⁺⁰	2.0 × 10⁻⁶	1.0 × 10⁺⁰	9.8 × 10⁻⁶	3.0 × 10⁺⁰	3.0 × 10⁻⁵	2.0 × 10⁺⁰	2.0 × 10⁻⁵
f_1-10	3.8 × 10⁺²	7.8 × 10⁻⁴	3.7 × 10⁺²	7.6 × 10⁻⁴	4.2 × 10⁺²	4.1 × 10⁻³	3.8 × 10⁺²	3.9 × 10⁻³	2.4 × 10⁺²	2.5 × 10⁻³
f_1-11	6.0 × 10⁺⁰	1.2 × 10⁻⁵	1.0 × 10⁺¹	2.0 × 10⁻⁵	1.0 × 10⁺¹	9.8 × 10⁻⁵	6.0 × 10⁺⁰	6.1 × 10⁻⁵	5.0 × 10⁺⁰	5.1 × 10⁻⁵
f_1-12	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰
f_1-13	1.7 × 10⁺¹	3.4 × 10⁻⁵	5.2 × 10⁺¹	1.0 × 10⁻⁴	6.8 × 10⁺¹	6.7 × 10⁻⁴	5.5 × 10⁺¹	5.5 × 10⁻⁴	1.4 × 10⁺¹	1.4 × 10⁻⁴
f_1-14	0.0 × 10⁺⁰	0.0 × 10⁺⁰	2.3 × 10⁺¹	4.7 × 10⁻⁵	2.3 × 10⁺¹	2.2 × 10⁻⁴	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰
f_1-15	6.0 × 10⁺⁰	1.2 × 10⁻⁵	1.1 × 10⁺¹	2.2 × 10⁻⁵	1.1 × 10⁺¹	1.0 × 10⁻⁴	6.0 × 10⁺⁰	6.1 × 10⁻⁵	1.1 × 10⁺¹	1.1 × 10⁻⁴
f_1-16	3.1 × 10⁺²	6.4 × 10⁻⁴	5.7 × 10⁺²	1.1 × 10⁻³	5.7 × 10⁺²	5.6 × 10⁻³	3.4 × 10⁺²	3.4 × 10⁻³	3.1 × 10⁺²	3.2 × 10⁻³
f_1-17	2.2 × 10⁺¹	4.4 × 10⁻⁵	2.3 × 10⁺²	4.7 × 10⁻⁴	2.3 × 10⁺²	2.3 × 10⁻³	5.0 × 10⁺¹	5.0 × 10⁻⁴	1.9 × 10⁺¹	1.9 × 10⁻⁴
f_1-18	3.0 × 10⁺⁰	6.0 × 10⁻⁶	4.3 × 10⁺¹	8.8 × 10⁻⁵	4.3 × 10⁺¹	4.2 × 10⁻⁴	1.0 × 10⁺⁰	1.0 × 10⁻⁵	2.0 × 10⁺⁰	2.0 × 10⁻⁵
f_1-19	5.0 × 10⁺⁰	1.0 × 10⁻⁵	4.4 × 10⁺²	9.0 × 10⁻⁴	4.4 × 10⁺²	4.3 × 10⁻³	5.0 × 10⁺⁰	5.0 × 10⁻⁵	2.9 × 10⁺¹	2.9 × 10⁻⁴
f_1-20	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰
f_1-21	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰
f_1-22	0.0 × 10⁺⁰	0.0 × 10⁺⁰	3.7 × 10⁺²	7.5 × 10⁻⁴	3.7 × 10⁺²	3.6 × 10⁻³	0.0 × 10⁺⁰	0.0 × 10⁺⁰	3.7 × 10⁺²	3.8 × 10⁻³
f_1-23	5.9 × 10⁺⁴	1.2 × 10⁻¹	5.8 × 10⁺⁴	1.2 × 10⁻¹	1.1 × 10⁺³	1.0 × 10⁻²	4.1 × 10⁺⁴	4.2 × 10⁻¹	4.1 × 10⁺⁴	4.2 × 10⁻¹
f_1-24	2.5 × 10⁺⁵	5.0 × 10⁻¹	2.5 × 10⁺⁵	5.1 × 10⁻¹	1.9 × 10⁺³	1.9 × 10⁻²	4.3 × 10⁺⁴	4.3 × 10⁻¹	5.1 × 10⁺⁴	5.3 × 10⁻¹
f_1-25	5.4 × 10⁺²	1.1 × 10⁻³	5.9 × 10⁺²	1.2 × 10⁻³	3.8 × 10⁺²	3.7 × 10⁻³	6.4 × 10⁺²	6.5 × 10⁻³	7.4 × 10⁺²	7.6 × 10⁻³
f_1-26	9.5 × 10⁺²	1.9 × 10⁻³	9.4 × 10⁺²	1.9 × 10⁻³	1.1 × 10⁺³	1.1 × 10⁻²	1.1 × 10⁺³	1.1 × 10⁻²	1.2 × 10⁺³	1.2 × 10⁻²
f_1-27	9.0 × 10⁺²	1.8 × 10⁻³	1.2 × 10⁺²	2.6 × 10⁻⁴	9.5 × 10⁺²	9.4 × 10⁻³	2.2 × 10⁺²	2.3 × 10⁻³	5.6 × 10⁺³	5.7 × 10⁻²
f_1-28	1.2 × 10⁺²	2.4 × 10⁻⁴	2.3 × 10⁺²	4.8 × 10⁻⁴	7.2 × 10⁺²	7.1 × 10⁻³	6.9 × 10⁺²	7.0 × 10⁻³	9.0 × 10⁺²	9.3 × 10⁻³
f_1-29	2.6 × 10⁺³	5.3 × 10⁻³	2.6 × 10⁺³	5.3 × 10⁻³	3.5 × 10⁺²	3.4 × 10⁻³	1.4 × 10⁺³	1.4 × 10⁻²	1.3 × 10⁺³	1.4 × 10⁻²
f_1-30	3.5 × 10⁺³	7.1 × 10⁻³	3.4 × 10⁺³	7.1 × 10⁻³	2.2 × 10⁺²	2.2 × 10⁻³	1.4 × 10⁺³	1.4 × 10⁻²	9.2 × 10⁺²	9.4 × 10⁻³
f_1-31	6.4 × 10⁺³	1.3 × 10⁻²	6.5 × 10⁺³	1.3 × 10⁻²	2.3 × 10⁺⁴	2.3 × 10⁻¹	2.2 × 10⁺⁴	2.2 × 10⁻¹	3.3 × 10⁺⁴	3.3 × 10⁻¹
f_1-32	3.0 × 10⁺⁰	6.0 × 10⁻⁶	3.0 × 10⁺⁰	6.1 × 10⁻⁶	1.5 × 10⁺⁴	1.5 × 10⁻¹	1.7 × 10⁺⁴	1.7 × 10⁻¹	4.6 × 10⁺⁴	4.7 × 10⁻¹
f_1-33	3.0 × 10⁺⁰	6.0 × 10⁻⁶	3.0 × 10⁺⁰	6.1 × 10⁻⁶	1.1 × 10⁺⁴	1.1 × 10⁻¹	1.9 × 10⁺⁴	1.9 × 10⁻¹	8.9 × 10⁺⁴	9.1 × 10⁻¹
f_1-34	0.0 × 10⁺⁰	0.0 × 10⁺⁰	1.7 × 10⁺³	3.5 × 10⁻³	1.8 × 10⁺⁴	1.7 × 10⁻¹	6.7 × 10⁺³	6.8 × 10⁻²	2.7 × 10⁺⁴	2.8 × 10⁻¹
f_1-35	2.6 × 10⁺¹	5.2 × 10⁻⁵	6.3 × 10⁺¹	1.2 × 10⁻⁴	2.0 × 10⁺¹	1.9 × 10⁻⁴	9.4 × 10⁺³	9.5 × 10⁻²	1.7 × 10⁺⁴	1.7 × 10⁻¹
f_1-36	0.0 × 10⁺⁰	0.0 × 10⁺⁰	3.1 × 10⁺²	6.5 × 10⁻⁴	7.2 × 10⁺²	7.1 × 10⁻³	5.5 × 10⁺³	5.6 × 10⁻²	2.8 × 10⁺⁴	2.9 × 10⁻¹
f_1-37	2.5 × 10⁺²	5.1 × 10⁻⁴	2.4 × 10⁺³	5.0 × 10⁻³	2.4 × 10⁺⁴	2.4 × 10⁻¹	1.0 × 10⁺⁴	1.0 × 10⁻¹	3.7 × 10⁺⁴	3.8 × 10⁻¹
f_1-38	1.3 × 10⁺²	2.6 × 10⁻⁴	2.9 × 10⁺²	6.1 × 10⁻⁴	1.3 × 10⁺²	1.2 × 10⁻³	1.2 × 10⁺²	1.2 × 10⁻³	4.8 × 10⁺³	5.0 × 10⁻²
f_1-39	6.4 × 10⁺¹	1.3 × 10⁻⁴	1.4 × 10⁺²	2.9 × 10⁻⁴	5.3 × 10⁺³	5.2 × 10⁻²	3.9 × 10⁺¹	3.9 × 10⁻⁴	5.4 × 10⁺³	5.5 × 10⁻²
f_1-40	0.0 × 10⁺⁰	0.0 × 10⁺⁰	1.3 × 10⁺²	2.6 × 10⁻⁴	3.8 × 10⁺¹	3.7 × 10⁻⁴	6.6 × 10⁺³	6.7 × 10⁻²	9.0 × 10⁺³	9.2 × 10⁻²
f_1-41	1.8 × 10⁺³	3.6 × 10⁻³	2.6 × 10⁺³	5.4 × 10⁻³	3.8 × 10⁺³	3.7 × 10⁻²	6.0 × 10⁺³	6.1 × 10⁻²	9.1 × 10⁺³	9.4 × 10⁻²

Table A2.

L o w e r

and

d e p R a t i o

of each feature in the UNSW-NB15 dataset.

Table A2.

L o w e r

and

d e p R a t i o

of each feature in the UNSW-NB15 dataset.

Feature	All Attacks		Fuzzers		Analysis		Backdoors		DoS
Feature	$L o w e r$	$D e p R a t i o$	$L o w e r$	$D e p R a t i o$	$L o w e r$	$D e p R a t i o$	$L o w e r$	$D e p R a t i o$	$L o w e r$	$D e p R a t i o$
f_2-1	9.4 × 10⁺⁴	5.3 × 10⁻¹	6.2 × 10⁺⁴	8.4 × 10⁻¹	5.1 × 10⁺⁴	8.8 × 10⁻¹	5.0 × 10⁺⁴	8.7 × 10⁻¹	5.3 × 10⁺⁴	7.8 × 10⁻¹
f_2-2	2.9 × 10⁺⁴	1.6 × 10⁻¹	4.2 × 10⁺³	5.7 × 10⁻²	1.8 × 10⁺⁴	3.1 × 10⁻¹	4.2 × 10⁺³	7.2 × 10⁻²	1.1 × 10⁺⁴	1.7 × 10⁻¹
f_2-3	1.7 × 10⁺²	9.9 × 10⁻⁴	5.4 × 10⁺³	7.3 × 10⁻²	1.2 × 10⁺⁴	2.1 × 10⁻¹	1.2 × 10⁺⁴	2.2 × 10⁻¹	1.3 × 10⁺³	1.9 × 10⁻²
f_2-4	1.5 × 10⁺¹	8.5 × 10⁻⁵	8.6 × 10⁺¹	1.1 × 10⁻³	8.6 × 10⁺¹	1.4 × 10⁻³	8.6 × 10⁺¹	1.4 × 10⁻³	1.5 × 10⁺¹	2.2 × 10⁻⁴
f_2-5	1.1 × 10⁺³	6.6 × 10⁻³	1.4 × 10⁺³	1.9 × 10⁻²	1.0 × 10⁺⁴	1.7 × 10⁻¹	4.5 × 10⁺³	7.9 × 10⁻²	1.7 × 10⁺³	2.5 × 10⁻²
f_2-6	1.5 × 10⁺³	9.0 × 10⁻³	9.8 × 10⁺³	1.3 × 10⁻¹	2.2 × 10⁺⁴	3.9 × 10⁻¹	1.5 × 10⁺⁴	2.7 × 10⁻¹	3.3 × 10⁺³	4.9 × 10⁻²
f_2-7	8.9 × 10⁺⁴	5.1 × 10⁻¹	2.6 × 10⁺⁴	3.5 × 10⁻¹	5.3 × 10⁺⁴	9.1 × 10⁻¹	5.1 × 10⁺⁴	8.9 × 10⁻¹	4.7 × 10⁺⁴	6.9 × 10⁻¹
f_2-8	3.6 × 10⁺⁴	2.0 × 10⁻¹	3.9 × 10⁺⁴	5.2 × 10⁻¹	4.5 × 10⁺⁴	7.7 × 10⁻¹	4.4 × 10⁺⁴	7.6 × 10⁻¹	3.2 × 10⁺⁴	4.7 × 10⁻¹
f_2-9	3.8 × 10⁺⁴	2.1 × 10⁻¹	3.5 × 10⁺⁴	4.7 × 10⁻¹	4.2 × 10⁺⁴	7.2 × 10⁻¹	4.3 × 10⁺⁴	7.5 × 10⁻¹	3.6 × 10⁺⁴	5.3 × 10⁻¹
f_2-10	3.9 × 10⁺⁴	2.2 × 10⁻¹	3.9 × 10⁺⁴	5.3 × 10⁻¹	3.9 × 10⁺⁴	6.8 × 10⁻¹	3.9 × 10⁺⁴	6.8 × 10⁻¹	3.9 × 10⁺⁴	5.8 × 10⁻¹
f_2-11	3.9 × 10⁺⁴	2.2 × 10⁻¹	3.9 × 10⁺⁴	5.3 × 10⁻¹	3.9 × 10⁺⁴	6.8 × 10⁻¹	3.9 × 10⁺⁴	6.8 × 10⁻¹	3.9 × 10⁺⁴	5.7 × 10⁻¹
f_2-12	7.3 × 10⁺⁴	4.2 × 10⁻¹	2.7 × 10⁺⁴	3.6 × 10⁻¹	4.6 × 10⁺⁴	8.0 × 10⁻¹	4.7 × 10⁺⁴	8.1 × 10⁻¹	4.2 × 10⁺⁴	6.2 × 10⁻¹
f_2-13	8.2 × 10⁺⁴	4.7 × 10⁻¹	5.7 × 10⁺⁴	7.7 × 10⁻¹	4.9 × 10⁺⁴	8.5 × 10⁻¹	4.9 × 10⁺⁴	8.5 × 10⁻¹	5.1 × 10⁺⁴	7.4 × 10⁻¹
f_2-14	1.2 × 10⁺³	6.8 × 10⁻³	1.2 × 10⁺³	1.6 × 10⁻²	2.2 × 10⁺⁴	3.9 × 10⁻¹	1.9 × 10⁺⁴	3.4 × 10⁻¹	7.0 × 10⁺²	1.0 × 10⁻²
f_2-15	2.1 × 10⁺³	1.2 × 10⁻²	1.3 × 10⁺⁴	1.7 × 10⁻¹	1.8 × 10⁺⁴	3.2 × 10⁻¹	1.7 × 10⁺⁴	2.9 × 10⁻¹	5.6 × 10⁺³	8.3 × 10⁻²
f_2-16	8.8 × 10⁺⁴	5.0 × 10⁻¹	5.7 × 10⁺⁴	7.7 × 10⁻¹	4.6 × 10⁺⁴	7.9 × 10⁻¹	4.5 × 10⁺⁴	7.8 × 10⁻¹	4.8 × 10⁺⁴	7.0 × 10⁻¹
f_2-17	8.1 × 10⁺⁴	4.6 × 10⁻¹	5.3 × 10⁺⁴	7.1 × 10⁻¹	4.8 × 10⁺⁴	8.2 × 10⁻¹	4.1 × 10⁺⁴	7.2 × 10⁻¹	4.3 × 10⁺⁴	6.3 × 10⁻¹
f_2-18	8.3 × 10⁺⁴	4.7 × 10⁻¹	5.3 × 10⁺⁴	7.2 × 10⁻¹	4.2 × 10⁺⁴	7.3 × 10⁻¹	4.1 × 10⁺⁴	7.2 × 10⁻¹	4.4 × 10⁺⁴	6.4 × 10⁻¹
f_2-19	7.7 × 10⁺⁴	4.4 × 10⁻¹	5.1 × 10⁺⁴	6.9 × 10⁻¹	4.1 × 10⁺⁴	7.1 × 10⁻¹	4.0 × 10⁺⁴	7.0 × 10⁻¹	4.2 × 10⁺⁴	6.1 × 10⁻¹
f_2-20	1.1 × 10⁺¹	6.2 × 10⁻⁵	1.1 × 10⁺¹	1.4 × 10⁻⁴	1.1 × 10⁺¹	1.9 × 10⁻⁴	1.1 × 10⁺¹	1.9 × 10⁻⁴	1.1 × 10⁺¹	1.6 × 10⁻⁴
f_2-21	7.9 × 10⁺⁴	4.5 × 10⁻¹	5.0 × 10⁺⁴	6.7 × 10⁻¹	3.8 × 10⁺⁴	6.7 × 10⁻¹	3.8 × 10⁺⁴	6.6 × 10⁻¹	4.0 × 10⁺⁴	5.9 × 10⁻¹
f_2-22	7.9 × 10⁺⁴	4.5 × 10⁻¹	5.0 × 10⁺⁴	6.7 × 10⁻¹	3.8 × 10⁺⁴	6.6 × 10⁻¹	3.8 × 10⁺⁴	6.6 × 10⁻¹	4.0 × 10⁺⁴	5.9 × 10⁻¹
f_2-23	5.0 × 10⁺⁰	2.8 × 10⁻⁵	5.0 × 10⁺⁰	6.7 × 10⁻⁵	5.0 × 10⁺⁰	8.6 × 10⁻⁵	5.0 × 10⁺⁰	8.6 × 10⁻⁵	5.0 × 10⁺⁰	7.3 × 10⁻⁵
f_2-24	7.5 × 10⁺⁴	4.3 × 10⁻¹	4.8 × 10⁺⁴	6.5 × 10⁻¹	3.8 × 10⁺⁴	6.6 × 10⁻¹	3.8 × 10⁺⁴	6.6 × 10⁻¹	4.0 × 10⁺⁴	5.9 × 10⁻¹
f_2-25	7.3 × 10⁺⁴	4.2 × 10⁻¹	4.8 × 10⁺⁴	6.5 × 10⁻¹	3.8 × 10⁺⁴	6.6 × 10⁻¹	3.8 × 10⁺⁴	6.6 × 10⁻¹	4.0 × 10⁺⁴	5.8 × 10⁻¹
f_2-26	7.2 × 10⁺⁴	4.1 × 10⁻¹	4.8 × 10⁺⁴	6.4 × 10⁻¹	3.8 × 10⁺⁴	6.6 × 10⁻¹	3.8 × 10⁺⁴	6.6 × 10⁻¹	4.0 × 10⁺⁴	5.8 × 10⁻¹
f_2-27	4.0 × 10⁺³	2.2 × 10⁻²	2.1 × 10⁺³	2.9 × 10⁻²	3.6 × 10⁺⁴	6.3 × 10⁻¹	2.8 × 10⁺⁴	4.9 × 10⁻¹	5.4 × 10⁺³	7.9 × 10⁻²
f_2-28	9.3 × 10⁺³	5.3 × 10⁻²	2.2 × 10⁺⁴	2.9 × 10⁻¹	3.7 × 10⁺⁴	6.4 × 10⁻¹	3.6 × 10⁺⁴	6.3 × 10⁻¹	1.2 × 10⁺⁴	1.8 × 10⁻¹
f_2-29	1.3 × 10⁺¹	7.4 × 10⁻⁵	6.9 × 10⁺¹	9.3 × 10⁻⁴	6.9 × 10⁺¹	1.1 × 10⁻³	6.9 × 10⁺¹	1.1 × 10⁻³	7.2 × 10⁺¹	1.0 × 10⁻³
f_2-30	7.5 × 10⁺³	4.3 × 10⁻²	4.5 × 10⁺³	6.1 × 10⁻²	4.0 × 10⁺³	6.9 × 10⁻²	4.7 × 10⁺³	8.2 × 10⁻²	4.9 × 10⁺³	7.2 × 10⁻²
f_2-31	4.6 × 10⁺³	2.6 × 10⁻²	2.5 × 10⁺²	3.4 × 10⁻³	2.7 × 10⁺³	4.7 × 10⁻²	2.7 × 10⁺³	4.7 × 10⁻²	2.2 × 10⁺³	3.3 × 10⁻²
f_2-32	0.0 × 10⁺⁰	0.0 × 10⁺⁰	1.0 × 10⁺³	1.3 × 10⁻²	1.0 × 10⁺³	1.7 × 10⁻²	1.0 × 10⁺³	1.7 × 10⁻²	0.0 × 10⁺⁰	0.0 × 10⁺⁰
f_2-33	6.3 × 10⁺³	3.6 × 10⁻²	1.4 × 10⁺²	1.9 × 10⁻³	1.0 × 10⁺³	1.7 × 10⁻²	1.0 × 10⁺³	1.7 × 10⁻²	1.5 × 10⁺²	2.2 × 10⁻³
f_2-34	8.8 × 10⁺³	5.0 × 10⁻²	1.8 × 10⁺²	2.5 × 10⁻³	9.6 × 10⁺²	1.6 × 10⁻²	1.1 × 10⁺³	1.9 × 10⁻²	9.3 × 10⁺²	1.3 × 10⁻²
f_2-35	3.6 × 10⁺⁴	2.0 × 10⁻¹	4.2 × 10⁺²	5.7 × 10⁻³	2.6 × 10⁺²	4.5 × 10⁻³	2.2 × 10⁺²	3.9 × 10⁻³	5.0 × 10⁺²	7.4 × 10⁻³
f_2-36	4.1 × 10⁺³	2.3 × 10⁻²	1.5 × 10⁺²	2.1 × 10⁻³	1.3 × 10⁺³	2.3 × 10⁻²	1.0 × 10⁺³	1.8 × 10⁻²	8.0 × 10⁺²	1.1 × 10⁻²
f_2-37	1.6 × 10⁺¹	9.1 × 10⁻⁵	0.0 × 10⁺⁰	0.0 × 10⁺⁰	9.4 × 10⁺²	1.6 × 10⁻²	9.4 × 10⁺²	1.6 × 10⁻²	2.0 × 10⁺⁰	2.9 × 10⁻⁵
f_2-38	1.6 × 10⁺¹	9.1 × 10⁻⁵	0.0 × 10⁺⁰	0.0 × 10⁺⁰	9.4 × 10⁺²	1.6 × 10⁻²	9.4 × 10⁺²	1.6 × 10⁻²	2.0 × 10⁺⁰	2.9 × 10⁻⁵
f_2-39	7.4 × 10⁺¹	4.2 × 10⁻⁴	4.8 × 10⁺¹	6.4 × 10⁻⁴	7.1 × 10⁺¹	1.2 × 10⁻³	7.0 × 10⁺¹	1.2 × 10⁻³	4.8 × 10⁺¹	7.0 × 10⁻⁴
f_2-40	3.3 × 10⁺³	1.9 × 10⁻²	7.7 × 10⁺¹	1.0 × 10⁻³	1.3 × 10⁺²	2.3 × 10⁻³	1.2 × 10⁺²	2.2 × 10⁻³	7.2 × 10⁺¹	1.0 × 10⁻³
f_2-41	3.0 × 10⁺³	1.7 × 10⁻²	3.3 × 10⁺²	4.5 × 10⁻³	2.3 × 10⁺³	3.9 × 10⁻²	2.3 × 10⁺³	3.9 × 10⁻²	2.0 × 10⁺³	3.0 × 10⁻²
f_2-42	2.7 × 10⁺³	1.5 × 10⁻²	2.7 × 10⁺³	3.7 × 10⁻²	2.7 × 10⁺³	4.7 × 10⁻²	2.7 × 10⁺³	4.7 × 10⁻²	2.7 × 10⁺³	4.0 × 10⁻²
Feature	Exploits		Generic		Reconnaissance		Shellcode		Worms
Feature	$L o w e r$	$D e p R a t i o$	$L o w e r$	$D e p R a t i o$	$L o w e r$	$D e p R a t i o$	$L o w e r$	$D e p R a t i o$	$L o w e r$	$D e p R a t i o$
f_2-1	7.1 × 10⁺⁴	7.9 × 10⁻¹	5.3 × 10⁺⁴	5.5 × 10⁻¹	5.5 × 10⁺⁴	8.3 × 10⁻¹	5.3 × 10⁺⁴	9.3 × 10⁻¹	5.4 × 10⁺⁴	9.6 × 10⁻¹
f_2-2	1.4 × 10⁺⁴	1.6 × 10⁻¹	3.1 × 10⁺³	3.2 × 10⁻²	4.5 × 10⁺³	6.7 × 10⁻²	2.9 × 10⁺³	5.1 × 10⁻²	2.9 × 10⁺³	5.2 × 10⁻²
f_2-3	1.0 × 10⁺²	1.1 × 10⁻³	2.5 × 10⁺³	2.6 × 10⁻²	5.0 × 10⁺³	7.6 × 10⁻²	1.9 × 10⁺⁴	3.4 × 10⁻¹	1.4 × 10⁺⁴	2.5 × 10⁻¹
f_2-4	1.5 × 10⁺¹	1.6 × 10⁻⁴	1.5 × 10⁺¹	1.5 × 10⁻⁴	1.5 × 10⁺¹	2.2 × 10⁻⁴	1.3 × 10⁺⁴	2.2 × 10⁻¹	1.0 × 10⁺³	1.8 × 10⁻²
f_2-5	9.4 × 10⁺²	1.0 × 10⁻²	4.0 × 10⁺³	4.1 × 10⁻²	4.8 × 10⁺³	7.2 × 10⁻²	3.8 × 10⁺⁴	6.7 × 10⁻¹	2.7 × 10⁺⁴	4.8 × 10⁻¹
f_2-6	1.7 × 10⁺³	2.0 × 10⁻²	9.5 × 10⁺³	9.9 × 10⁻²	2.2 × 10⁺⁴	3.3 × 10⁻¹	4.1 × 10⁺⁴	7.3 × 10⁻¹	2.8 × 10⁺⁴	5.0 × 10⁻¹
f_2-7	3.7 × 10⁺⁴	4.2 × 10⁻¹	8.4 × 10⁺⁴	8.8 × 10⁻¹	4.8 × 10⁺⁴	7.2 × 10⁻¹	4.7 × 10⁺⁴	8.3 × 10⁻¹	5.5 × 10⁺⁴	9.9 × 10⁻¹
f_2-8	4.0 × 10⁺⁴	4.5 × 10⁻¹	4.2 × 10⁺⁴	4.4 × 10⁻¹	4.2 × 10⁺⁴	6.3 × 10⁻¹	4.4 × 10⁺⁴	7.7 × 10⁻¹	4.4 × 10⁺⁴	7.8 × 10⁻¹
f_2-9	3.6 × 10⁺⁴	4.0 × 10⁻¹	4.2 × 10⁺⁴	4.4 × 10⁻¹	3.6 × 10⁺⁴	5.5 × 10⁻¹	4.3 × 10⁺⁴	7.6 × 10⁻¹	5.1 × 10⁺⁴	9.1 × 10⁻¹
f_2-10	3.9 × 10⁺⁴	4.4 × 10⁻¹	3.9 × 10⁺⁴	4.1 × 10⁻¹	3.9 × 10⁺⁴	5.9 × 10⁻¹	4.4 × 10⁺⁴	7.8 × 10⁻¹	4.2 × 10⁺⁴	7.5 × 10⁻¹
f_2-11	3.9 × 10⁺⁴	4.4 × 10⁻¹	3.9 × 10⁺⁴	4.1 × 10⁻¹	3.9 × 10⁺⁴	5.9 × 10⁻¹	3.9 × 10⁺⁴	6.9 × 10⁻¹	3.9 × 10⁺⁴	7.0 × 10⁻¹
f_2-12	3.3 × 10⁺⁴	3.7 × 10⁻¹	7.2 × 10⁺⁴	7.5 × 10⁻¹	3.9 × 10⁺⁴	5.9 × 10⁻¹	4.7 × 10⁺⁴	8.3 × 10⁻¹	5.3 × 10⁺⁴	9.6 × 10⁻¹
f_2-13	6.6 × 10⁺⁴	7.4 × 10⁻¹	4.9 × 10⁺⁴	5.1 × 10⁻¹	5.1 × 10⁺⁴	7.6 × 10⁻¹	4.9 × 10⁺⁴	8.6 × 10⁻¹	4.9 × 10⁺⁴	8.8 × 10⁻¹
f_2-14	9.2 × 10⁺²	1.0 × 10⁻²	3.8 × 10⁺³	3.9 × 10⁻²	2.5 × 10⁺⁴	3.7 × 10⁻¹	3.1 × 10⁺⁴	5.4 × 10⁻¹	2.2 × 10⁺⁴	3.9 × 10⁻¹
f_2-15	2.0 × 10⁺³	2.2 × 10⁻²	5.7 × 10⁺³	5.9 × 10⁻²	2.3 × 10⁺⁴	3.5 × 10⁻¹	3.0 × 10⁺⁴	5.3 × 10⁻¹	2.7 × 10⁺⁴	4.8 × 10⁻¹
f_2-16	6.5 × 10⁺⁴	7.3 × 10⁻¹	4.5 × 10⁺⁴	4.7 × 10⁻¹	5.0 × 10⁺⁴	7.5 × 10⁻¹	4.6 × 10⁺⁴	8.0 × 10⁻¹	4.8 × 10⁺⁴	8.6 × 10⁻¹
f_2-17	6.0 × 10⁺⁴	6.7 × 10⁻¹	4.3 × 10⁺⁴	4.5 × 10⁻¹	4.6 × 10⁺⁴	7.0 × 10⁻¹	4.9 × 10⁺⁴	8.7 × 10⁻¹	4.9 × 10⁺⁴	8.7 × 10⁻¹
f_2-18	6.1 × 10⁺⁴	6.9 × 10⁻¹	4.2 × 10⁺⁴	4.4 × 10⁻¹	4.6 × 10⁺⁴	7.0 × 10⁻¹	4.2 × 10⁺⁴	7.4 × 10⁻¹	4.2 × 10⁺⁴	7.5 × 10⁻¹
f_2-19	5.7 × 10⁺⁴	6.4 × 10⁻¹	4.1 × 10⁺⁴	4.2 × 10⁻¹	4.5 × 10⁺⁴	6.8 × 10⁻¹	4.1 × 10⁺⁴	7.2 × 10⁻¹	4.0 × 10⁺⁴	7.2 × 10⁻¹
f_2-20	1.1 × 10⁺¹	1.2 × 10⁻⁴	1.1 × 10⁺¹	1.1 × 10⁻⁴	1.1 × 10⁺¹	1.6 × 10⁻⁴	1.1 × 10⁺¹	1.9 × 10⁻⁴	1.1 × 10⁺¹	1.9 × 10⁻⁴
f_2-21	5.8 × 10⁺⁴	6.4 × 10⁻¹	3.8 × 10⁺⁴	4.0 × 10⁻¹	4.3 × 10⁺⁴	6.5 × 10⁻¹	3.8 × 10⁺⁴	6.8 × 10⁻¹	3.8 × 10⁺⁴	6.8 × 10⁻¹
f_2-22	5.8 × 10⁺⁴	6.4 × 10⁻¹	3.8 × 10⁺⁴	4.0 × 10⁻¹	4.3 × 10⁺⁴	6.5 × 10⁻¹	3.8 × 10⁺⁴	6.7 × 10⁻¹	3.8 × 10⁺⁴	6.8 × 10⁻¹
f_2-23	5.0 × 10⁺⁰	5.5 × 10⁻⁵	5.0 × 10⁺⁰	5.2 × 10⁻⁵	5.0 × 10⁺⁰	7.5 × 10⁻⁵	5.0 × 10⁺⁰	8.7 × 10⁻⁵	5.0 × 10⁺⁰	8.9 × 10⁻⁵
f_2-24	5.6 × 10⁺⁴	6.3 × 10⁻¹	3.8 × 10⁺⁴	4.0 × 10⁻¹	4.3 × 10⁺⁴	6.4 × 10⁻¹	3.8 × 10⁺⁴	6.7 × 10⁻¹	3.8 × 10⁺⁴	6.8 × 10⁻¹
f_2-25	5.5 × 10⁺⁴	6.2 × 10⁻¹	3.8 × 10⁺⁴	4.0 × 10⁻¹	4.2 × 10⁺⁴	6.4 × 10⁻¹	3.8 × 10⁺⁴	6.7 × 10⁻¹	3.8 × 10⁺⁴	6.8 × 10⁻¹
f_2-26	5.4 × 10⁺⁴	6.1 × 10⁻¹	3.8 × 10⁺⁴	4.0 × 10⁻¹	4.2 × 10⁺⁴	6.3 × 10⁻¹	3.8 × 10⁺⁴	6.7 × 10⁻¹	3.8 × 10⁺⁴	6.8 × 10⁻¹
f_2-27	3.1 × 10⁺³	3.5 × 10⁻²	6.6 × 10⁺³	6.9 × 10⁻²	1.7 × 10⁺⁴	2.6 × 10⁻¹	1.9 × 10⁺⁴	3.4 × 10⁻¹	4.5 × 10⁺⁴	8.0 × 10⁻¹
f_2-28	8.0 × 10⁺³	9.0 × 10⁻²	1.7 × 10⁺⁴	1.8 × 10⁻¹	3.7 × 10⁺⁴	5.6 × 10⁻¹	4.3 × 10⁺⁴	7.6 × 10⁻¹	4.3 × 10⁺⁴	7.7 × 10⁻¹
f_2-29	1.0 × 10⁺¹	1.1 × 10⁻⁴	6.9 × 10⁺¹	7.1 × 10⁻⁴	6.9 × 10⁺¹	1.0 × 10⁻³	5.1 × 10⁺³	9.0 × 10⁻²	6.9 × 10⁺¹	1.2 × 10⁻³
f_2-30	6.7 × 10⁺³	7.5 × 10⁻²	4.8 × 10⁺³	5.0 × 10⁻²	4.6 × 10⁺³	7.0 × 10⁻²	4.7 × 10⁺³	8.2 × 10⁻²	4.7 × 10⁺³	8.4 × 10⁻²
f_2-31	1.1 × 10⁺³	1.2 × 10⁻²	4.4 × 10⁺³	4.6 × 10⁻²	8.3 × 10⁺²	1.2 × 10⁻²	2.1 × 10⁺³	3.8 × 10⁻²	1.7 × 10⁺⁴	3.0 × 10⁻¹
f_2-32	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	0.0 × 10⁺⁰	4.1 × 10⁺⁴	7.3 × 10⁻¹	1.8 × 10⁺³	3.3 × 10⁻²
f_2-33	8.1 × 10⁺¹	9.0 × 10⁻⁴	6.3 × 10⁺³	6.5 × 10⁻²	1.5 × 10⁺²	2.3 × 10⁻³	7.1 × 10⁺³	1.2 × 10⁻¹	1.2 × 10⁺⁴	2.3 × 10⁻¹
f_2-34	8.7 × 10⁺²	9.8 × 10⁻³	8.8 × 10⁺³	9.2 × 10⁻²	9.6 × 10⁺²	1.4 × 10⁻²	1.5 × 10⁺⁴	2.6 × 10⁻¹	4.1 × 10⁺³	7.3 × 10⁻²
f_2-35	6.0 × 10⁺²	6.8 × 10⁻³	3.5 × 10⁺⁴	3.6 × 10⁻¹	2.3 × 10⁺²	3.5 × 10⁻³	4.4 × 10⁺³	7.8 × 10⁻²	4.3 × 10⁺²	7.7 × 10⁻³
f_2-36	1.3 × 10⁺²	1.5 × 10⁻³	4.0 × 10⁺³	4.2 × 10⁻²	3.4 × 10⁺²	5.1 × 10⁻³	2.6 × 10⁺³	4.6 × 10⁻²	8.0 × 10⁺³	1.4 × 10⁻¹
f_2-37	1.8 × 10⁺¹	2.0 × 10⁻⁴	9.4 × 10⁺²	9.8 × 10⁻³	9.4 × 10⁺²	1.4 × 10⁻²	9.4 × 10⁺²	1.6 × 10⁻²	9.4 × 10⁺²	1.6 × 10⁻²
f_2-38	1.8 × 10⁺¹	2.0 × 10⁻⁴	9.4 × 10⁺²	9.8 × 10⁻³	9.4 × 10⁺²	1.4 × 10⁻²	9.4 × 10⁺²	1.6 × 10⁻²	9.4 × 10⁺²	1.6 × 10⁻²
f_2-39	5.7 × 10⁺¹	6.3 × 10⁻⁴	5.4 × 10⁺¹	5.6 × 10⁻⁴	4.8 × 10⁺¹	7.2 × 10⁻⁴	5.1 × 10⁺³	9.0 × 10⁻²	5.4 × 10⁺¹	9.6 × 10⁻⁴
f_2-40	2.0 × 10⁺¹	2.2 × 10⁻⁴	3.3 × 10⁺³	3.4 × 10⁻²	7.0 × 10⁺¹	1.0 × 10⁻³	2.3 × 10⁺³	4.1 × 10⁻²	9.4 × 10⁺³	1.6 × 10⁻¹
f_2-41	1.5 × 10⁺³	1.7 × 10⁻²	3.0 × 10⁺³	3.1 × 10⁻²	1.5 × 10⁺³	2.3 × 10⁻²	4.7 × 10⁺³	8.3 × 10⁻²	1.6 × 10⁺⁴	2.9 × 10⁻¹
f_2-42	2.7 × 10⁺³	3.0 × 10⁻²	2.7 × 10⁺³	2.8 × 10⁻²	2.7 × 10⁺³	4.1 × 10⁻²	2.7 × 10⁺³	4.8 × 10⁻²	2.7 × 10⁺³	4.9 × 10⁻²

References

Kabir, E.; Hu, J.; Wang, H.; Zhuo, G. A Novel Statistical Technique for Intrusion Detection Systems. Future Gener. Comput. Syst. 2018, 79, 303–318. [Google Scholar] [CrossRef] [Green Version]
Heenan, R.; Moradpoor, N. A Survey of Intrusion Detection System Technologies. In Proceedings of the 1st Post Graduate Cyber Security (PGCS) Symposium, Edinburgh, UK, 10 May 2016. [Google Scholar]
Van der Toorn, O.; Hofstede, R.; Jonker, M.; Sperotto, A. A First Look at HTTP(S) Intrusion Detection Using NetFlow/IPFIX. In Proceedings of the 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM), Ottawa, ON, Canada, 11–15 May 2015; pp. 862–865. [Google Scholar]
Almansor, M.; Gan, K.B. Intrusion Detection Systems: Principles and Perspectives. J. Multidiscip. Eng. Sci. Stud. 2018, 4, 2458–2925. [Google Scholar]
Othman, Z.A.; Adabashi, A.M.; Zainudin, S.; Alhashmi, S.M. Improvement Anomaly Intrusion Detection Using Fuzzy-ART Based on K-Means Based on SNC Labeling. Asia-Pac. J. Inf. Technol. Multimed. (APJITM) 2011, 10, 1–11. [Google Scholar]
Ojha, V.K.; Abraham, A.; Snášel, V. Metaheuristic Design of Feedforward Neural Networks: A Review of Two Decades of Research. Eng. Appl. Artif. Intell. 2017, 60, 97–116. [Google Scholar] [CrossRef] [Green Version]
Sahu, S.K.; Sarangi, S.; Jena, S.K. A Detail Analysis on Intrusion Detection Datasets. In Proceedings of the 2014 IEEE International Advance Computing Conference (IACC), Bangkok, Thailand, 21–22 February 2014; pp. 1348–1353. [Google Scholar]
KDD99 Dataset. UCI KDD Archive. 1999. Available online: http://http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (accessed on 10 January 2020).
Moustafa, N.; Slay, J. UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems (UNSW-NB15 Network Data Set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, ACT, Australia, 10–12 November 2015; pp. 1–6. [Google Scholar]
UNSW-NB15 Dataset. UNSW Canberra Cyber. 2015. Available online: https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets (accessed on 10 January 2020).
Hajisalem, V.; Babaie, S. A Hybrid Intrusion Detection System Based on ABC-AFS Algorithm for Misuse and Anomaly Detection. Comput. Netw. 2018, 136, 37–50. [Google Scholar] [CrossRef]
Khammassi, C.; Krichen, S. A GA-LR Wrapper Approach for Feature Selection in Network Intrusion Detection. Comput. Secur. 2017, 70, 255–277. [Google Scholar] [CrossRef]
Al-Yaseen, W.; Othman, Z.A.; Nazri, M.Z. Hybrid Modified K-Means with C4.5 for Intrusion Detection Systems in Multiagent Systems. Sci. World J. 2015, 2015, 294761. [Google Scholar] [CrossRef]
Al-Yaseen, W.; Othman, Z.A.; Nazri, M.Z. Multi-Level Hybrid Support Vector Machine and Extreme Learning Machine Based on Modified K-Means for Intrusion Detection System. Expert Syst. Appl. 2017, 67, 296–303. [Google Scholar] [CrossRef]
Al-Yaseen, W.; Othman, Z.A.; Nazri, M.Z. Real-Time Multi-Agent System for an Adaptive Intrusion Detection System. Pattern Recognit. Lett. 2017, 85, 56–64. [Google Scholar] [CrossRef]
Araújo, N.; gonçalves de oliveira, R.; Ferreira, E.W.; Shinoda, A.; Bhargava, B. Identifying Important Characteristics in the KDD99 Intrusion Detection Dataset by Feature Selection Using a Hybrid Approach. In Proceedings of the 2010 17th International Conference on Telecommunications, Doha, Qatar, 4–7 April 2010; pp. 552–558. [Google Scholar] [CrossRef] [Green Version]
Essid, M.; Jemili, F. Combining Intrusion Detection Datasets Using MapReduce. In Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Budapest, Hungary, 9–12 October 2016; pp. 4724–4728. [Google Scholar]
Jing, D.; Chen, H. SVM Based Network Intrusion Detection for the UNSW-NB15 Dataset. In Proceedings of the 2019 IEEE 13th International Conference on ASIC (ASICON), Chongqing, China, 29 October–1 November 2019; pp. 1–4. [Google Scholar]
Kadis, M.R.; Abdullah, A. Global and Local Clustering Soft Assignment for Intrusion Detection System: A Comparative Study. Asia-Pac. J. Inf. Technol. Multimed. (APJITM) 2017, 6, 57–69. [Google Scholar] [CrossRef]
Kuang, F.; Zhang, S. A Novel Network Intrusion Detection Based on Support Vector Machine and Tent Chaos Artificial Bee Colony Algorithm. J. Netw. Intell. 2017, 2, 195–204. [Google Scholar]
Eesa, A.S.; Orman, Z.; Brifcani, A.M.A. A Novel Feature-Selection Approach Based on the Cuttlefish Optimization Algorithm for Intrusion Detection Systems. Expert Syst. Appl. 2015, 42, 2670–2679. [Google Scholar] [CrossRef]
Balasaraswathi, R.; Sugumaran, M.; Hamid, Y. Chaotic Cuttle Fish Algorithm for Feature Selection of Intrusion Detection System. Int. J. Pure Appl. Math 2018, 119, 921–935. [Google Scholar]
Al-Daweri, M.; Abdullah, S.; Ariffin, K. A Migration-Based Cuttlefish Algorithm with Short-Term Memory for Optimization Problems. IEEE Access 2020, 8, 70270–70292. [Google Scholar] [CrossRef]
Kumar, V.; Sinha, D.; Das, A.; Pandey, D.S.; Goswami, R. An Integrated Rule Based Intrusion Detection System: Analysis on UNSW-NB15 Data Set and the Real Time Online Dataset. Clust. Comput. 2020, 23. [Google Scholar] [CrossRef]
Shah, A.A.; Khan, Y.D.; Ashraf, M.A. Attacks Analysis of TCP and UDP of UNSW-NB15 Dataset. Vawkum Trans. Comput. Sci. 2018, 15, 143–149. [Google Scholar] [CrossRef]
Ruan, Z.; Miao, Y.; Pan, L.; Patterson, N.; Zhang, J. Visualization of Big Data Security: A Case Study on the KDD99 Cup Data Set. Digit. Commun. Netw. 2017, 3, 250–259. [Google Scholar] [CrossRef]
Moustafa, N.; Slay, J. The Significant Features of the UNSW-NB15 and the KDD99 Data Sets for Network Intrusion Detection Systems. In Proceedings of the2015 4th International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS), Kyoto, Japan, 5 November 2015; pp. 25–31. [Google Scholar]
Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A Detailed Analysis of the KDD CUP 99 Data Set. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009; pp. 1–6. [Google Scholar]
Adetunmbi, A.; Oladele, A.S.; Abosede, D.O. Analysis of KDD 99 Intrusion Detection Dataset for Selection of Relevance Features. Proc. World Congr. Eng. Comput. Sci. 2010, 1, 20–22. [Google Scholar]
Kayacik, H.G.; Zincir-Heywood, A.N.; Heywood, M.I. Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99. In Proceedings of the Third Annual Conference on Privacy, Security and Trust, St. Andrews, NB, Canada, 12–14 October 2005. [Google Scholar]
Ring, M.; Wunderlich, S.; Scheuring, D.; Landes, D.; Hotho, A. A Survey of Network-Based Intrusion Detection Data Sets. Comput. Secur. 2019, 86, 147–167. [Google Scholar] [CrossRef] [Green Version]
Hamid, Y.; Ranganathan, B.; Journaux, L.; Sugumaran, M. Benchmark Datasets for Network Intrusion Detection: A Review. Int. J. Netw. Secur. 2018, 20, 645–654. [Google Scholar]
Choudhary, S.; Kesswani, N. Analysis of KDD-Cup’99, NSL-KDD and UNSW-NB15 Datasets Using Deep Learning in IoT. Procedia. Comput. Sci. 2020, 167, 1561–1573. [Google Scholar] [CrossRef]
Binbusayyis, A.; Vaiyapuri, T. Comprehensive Analysis and Recommendation of Feature Evaluation Measures for Intrusion Detection. Heliyon 2020, 6, e04262. [Google Scholar] [CrossRef] [PubMed]
Rajagopal, S.; Hareesha, K.S.; Kundapur, P.P. Feature Relevance Analysis and Feature Reduction of UNSW NB-15 Using Neural Networks on MAMLS. In Advanced Computing and Intelligent Engineering-Proceedings of ICACIE 2018; Pati, B., Panigrahi, C.R., Buyya, R., Li, K.-C., Eds.; Advances in Intelligent Systems and Computing; Springer: Paris, France, 2020; pp. 321–332. [Google Scholar]
Almomani, O. A Feature Selection Model for Network Intrusion Detection System Based on PSO, GWO, FFA and GA Algorithms. Symmetry 2020, 12, 1046. [Google Scholar] [CrossRef]
Sarnovsky, M.; Paralic, J. Hierarchical Intrusion Detection Using Machine Learning and Knowledge Model. Symmetry 2020, 12, 203. [Google Scholar] [CrossRef] [Green Version]
Iwendi, C.; Khan, S.; Anajemba, J.H.; Mittal, M.; Alenezi, M.; Alazab, M. The Use of Ensemble Models for Multiple Class and Binary Class Classification for Improving Intrusion Detection Systems. Sensors 2020, 20, 2559. [Google Scholar] [CrossRef]
Dunn, C.; Moustafa, N.; Turnbull, B. Robustness Evaluations of Sustainable Machine Learning Models against Data Poisoning Attacks in the Internet of Things. Sustainability 2020, 12, 6434. [Google Scholar] [CrossRef]
Meghdouri, F.; Zseby, T.; Iglesias, F. Analysis of Lightweight Feature Vectors for Attack Detection in Network Traffic. Appl. Sci. 2018, 8, 2196. [Google Scholar] [CrossRef] [Green Version]
Wu, T.; Chen, C.; Sun, X.; Liu, S.; Lin, J. A Countermeasure to SQL Injection Attack for Cloud Environment. Wirel. Pers. Commun. 2017, 96, 5279–5293. [Google Scholar] [CrossRef]
Özgür, A.; Erdem, H. A Review of KDD99 Dataset Usage in Intrusion Detection and Machine Learning between 2010 and 2015. Peer J. Prepr. 2016. [Google Scholar] [CrossRef] [Green Version]
Pawlak, Z. Rough Sets: Theoretical Aspects of Reasoning about Data; Kluwer Academic Publishers: Boston, MA, USA, 1992. [Google Scholar]
McCaffrey, J. Neural Networks Using C# Succinctly; CreateSpace Independent Publishing Platform: Scotts Valley, CA, USA, 2017. [Google Scholar]
Fausett, L.V. Fundamentals of Neural Networks: Architectures, Algorithms, and Applications; Prentice-Hall Inc.: Upper Saddle River, NJ, USA, 1994. [Google Scholar]
Eesa, A.; Mohsin Abdulazeez, A.; Orman, Z. A Novel Bio-Inspired Optimization Algorithm. Int. J. Sci. Eng. Res. 2013, 4, 1978–1986. [Google Scholar]
Jaddi, N.S.; Abdullah, S.; Hamdan, A.R. A Solution Representation of Genetic Algorithm for Neural Network Weights and Structure. Inf. Process. Lett. 2016, 116, 22–25. [Google Scholar] [CrossRef]
Wireshark. 2006. Available online: https://www.wireshark.org/docs/ (accessed on 19 June 2020).

Figure 1. The percentage of class distribution in the KDD99′s training and testing sets.

Figure 2. The percentage of duplicated records for each class in the KDD99′s training and testing sets.

Figure 3. The percentage of class distribution in the UNSW-NB15′s training and testing sets.

Figure 4. The percentage of duplicated records for each class in the KDD99′s training and testing sets.

Figure 5. Percentage comparison of the normal and attack class records in the training and testing sets of the KDD99 and UNSW-NB15 datasets.

Figure 6. The methodology of using the rough-set theory (RST), back-propagation neural network (BPNN), and discrete cuttlefish algorithm (D-CFA) to analyze the datasets.

Figure 7. An example of a neuron with inputs (

i n p_{1} - i n p_{n}

), weights (

w_{1} - w_{n}

), bias (

b

), activation function (

σ

), and output (

y

).

Figure 7. An example of a neuron with inputs (

i n p_{1} - i n p_{n}

), weights (

w_{1} - w_{n}

), bias (

b

), activation function (

σ

), and output (

y

).

Figure 8. Flowchart of the D-CFA.

Figure 9. Average dependency ratio (

A D R

) of the features based on each attack in the KDD99 dataset.

Figure 9. Average dependency ratio (

A D R

) of the features based on each attack in the KDD99 dataset.

Figure 10.

A D R

of the features based on each attack in the UNSW-NB15 dataset.

Figure 10.

A D R

of the features based on each attack in the UNSW-NB15 dataset.

Figure 11. Average classification accuracy (

A A C C

) of the features based on the different number of nodes in the hidden layer, using the KDD99 dataset.

Figure 11. Average classification accuracy (

A A C C

) of the features based on the different number of nodes in the hidden layer, using the KDD99 dataset.

Figure 12.

A A C C

of the features based on the different number of nodes in the hidden layer, using the UNSW-NB15 dataset.

Figure 12.

A A C C

of the features based on the different number of nodes in the hidden layer, using the UNSW-NB15 dataset.

Figure 13. Classification accuracy

(A C C)

of each feature in the KDD99, using the best value from all the hidden layer nodes simulations.

Figure 13. Classification accuracy

(A C C)

of each feature in the KDD99, using the best value from all the hidden layer nodes simulations.

Figure 14.

A C C

of each feature in the UNSW-NB15, using the best value from all the hidden layer nodes simulations.

Figure 14.

A C C

of each feature in the UNSW-NB15, using the best value from all the hidden layer nodes simulations.

Table 1. The amount of duplications in the training and testing sets of the KDD99.

Class	Training Set			Testing Set
Class	No. of Duplicates	No. of Records	Duplicates Percentage	No. of Duplicates	No. of Records	Duplicates Percentage
All	348,437	494,021	70.53	233,813	311,029	75.17
Normal	9446	97,278	09.71	12,680	60,593	20.92
DoS	336,886	391,459	86.05	206,285	229,853	89.74
Probe	1977	4107	48.13	1488	4166	35.71
R2L	127	1126	11.27	13,276	16,189	82.00
U2R	0	52	0.00	13	228	5.70

Table 2. The amount of duplications in the training and testing sets of the UNSW-NB15 dataset.

Class	Training Set			Testing Set
Class	No. of Duplicates	No. of Records	Duplicates Percentage	No. of Duplicates	No. of Records	Duplicates Percentage
All	74,072	175,341	42.24	0	82,332	0.00
Normal	4110	56,000	7.33	0	37,000	0.00
Fuzzers	2034	18,184	11.18	0	6062	0.00
Analysis	405	2000	20.25	0	677	0.00
Backdoors	211	1746	12.08	0	583	0.00
DoS	8457	12,264	68.95	0	4089	0.00
Exploits	13,548	33,393	40.57	0	11,132	0.00
Generic	35,819	40,000	89.54	0	18,871	0.00
Reconnaissance	2969	10,491	28.30	0	3496	0.00
Shellcode	42	1133	3.70	0	378	0.00
Worms	3	130	2.30	0	44	0.00

Table 3. KDD99 and UNSW-NB15 list of features.

KDD99		UNSW-NB15
Feature	Name	Feature	Name
f_1-1	duration	f_2-1	Dur
f_1-2	protocol_type	f_2-2	Proto
f_1-3	service	f_2-3	Service
f_1-4	flag	f_2-4	State
f_1-5	src_bytes	f_2-5	Spkts
f_1-6	dst_bytes	f_2-6	Dpkts
f_1-7	land	f_2-7	Sbytes
f_1-8	wrong_fragment	f_2-8	Dbytes
f_1-9	urgent	f_2-9	Rate
f_1-10	hot	f_2-10	Sttl
f_1-11	num_failed_logins	f_2-11	Dttl
f_1-12	logged_in	f_2-12	Sload
f_1-13	lnum_compromised	f_2-13	Dload
f_1-14	lroot_shell	f_2-14	Sloss
f_1-15	lsu_attempted	f_2-15	Dloss
f_1-16	lnum_root	f_2-16	Sinpkt
f_1-17	lnum_file_creations	f_2-17	Dinpkt
f_1-18	lnum_shells	f_2-18	Sjit
f_1-19	lnum_access_files	f_2-19	Djit
f_1-20	lnum_outbound_cmds	f_2-20	Swin
f_1-21	is_host_login	f_2-21	Stcpb
f_1-22	is_guest_login	f_2-22	Dtcpb
f_1-23	count	f_2-23	Dwin
f_1-24	srv_count	f_2-24	Tcprtt
f_1-25	serror_rate	f_2-25	Synack
f_1-26	srv_serror_rate	f_2-26	Ackdat
f_1-27	rerror_rate	f_2-27	Smean
f_1-28	srv_rerror_rate	f_2-28	Dmean
f_1-29	same_srv_rate	f_2-29	trans_depth
f_1-30	diff_srv_rate	f_2-30	response_body_len
f_1-31	srv_diff_host_rate	f_2-31	ct_srv_src
f_1-32	dst_host_count	f_2-32	ct_state_ttl
f_1-33	dst_host_srv_count	f_2-33	ct_dst_ltm
f_1-34	dst_host_same_srv_rate	f_2-34	ct_src_dport_ltm
f_1-35	dst_host_diff_srv_rate	f_2-35	ct_dst_sport_ltm
f_1-36	dst_host_same_src_port_rate	f_2-36	ct_dst_src_ltm
f_1-37	dst_host_srv_diff_host_rate	f_2-37	is_ftp_login
f_1-38	dst_host_serror_rate	f_2-38	ct_ftp_cmd
f_1-39	dst_host_srv_serror_rate	f_2-39	ct_flw_http_mthd
f_1-40	dst_host_rerror_rate	f_2-40	ct_src_ltm
f_1-41	dst_host_srv_rerror_rate	f_2-41	ct_srv_dst
		f_2-42	is_sm_ips_ports

Table 4. The four groups of features in the KDD99 dataset.

Group	Features	Count
Basic	f_1-1, f_1-2, f_1-3, f_1-4, f_1-5, f_1-6, f_1-7, f_1-8, f_1-9	9
Content	f_1-10, f_1-11, f_1-12, f_1-13, f_1-14, f_1-15, f_1-16, f_1-17, f_1-18, f_1-19, f_1-20, f_1-21	13
Time	f_1-23, f_1-24, f_1-25, f_1-26, f_1-27, f_1-28, f_1-29, f_1-30, f_1-31	9
Host	f_1-32, f_1-33, f_1-34, f_1-35, f_1-36, f_1-37, f_1-38, f_1-39, f_1-40, f_1-41	10

Table 5. The five groups of features in the UNSW-NB15 dataset.

Group	Features	Count
Flow	f_2-2	1
Basic	f_2-1, f_2-3, f_2-4, f_2-5, f_2-6, f_2-7, f_2-8, f_2-9, f_2-10, f_2-11, f_2-12, f_2-13, f_2-14, f_2-15	14
Content	f_2-20, f_2-21, f_2-22, f_2-23, f_2-27, f_2-28, f_2-29, f_2-30	8
Time	f_2-16, f_2-17, f_2-18, f_2-19, f_2-24, f_2-25, f_2-26, f_2-42	8
Additional	f_2-31, f_2-32, f_2-33, f_2-34, f_2-35, f_2-36, f_2-37, f_2-38, f_2-39, f_2-40, f_2-41	11

Table 6. Similarities of the features in KDD99 and UNSW-NB15.

Category	KDD99	UNSW-NB15
Common features	f_1-1, f_1-2, f_1-3, f_1-5, f_1-6	f_2-1, f_2-2, f_2-3, f_2-7, f_2-8
Features that use connection flags	f_1-4, f_1-9, f_1-24, f_1-25, f_1-29, f_1-30, f_1-38, f_1-39, f_1-40, f_1-41	f_2-4, f_2-24, f_2-25, f_2-26
Features that count connections	f_1-5, f_1-6, f_1-23, f_1-24, f_1-25, f_1-26, f_1-27, f_1-28, f_1-29, f_1-30, f_1-31, f_1-32, f_1-33, f_1-34, f_1-35, f_1-36, f_1-37, f_1-38, f_1-39, f_1-40, f_1-41	f_2-31, f_2-33, f_2-34, f_2-35, f_2-36, f_2-40, f_2-41
Size-based features (transmitted bits, bytes, or packets)	f_1-5, f_1-6	f_2-5, f_2-6, f_2-7, f_2-8, f_2-12, f_2-13, f_2-14, f_2-15, f_2-27, f_2-28, f_2-30
Features that calculates time (e.g., connection duration)	f_1-1, f_1-23, f_1-28	f_2-1, f_2-10, f_2-11, f_2-18, f_2-19, f_2-24, f_2-25, f_2-26

Table 7. The parameters used for the BPNN training process.

Parameter	Value
Maximum number of epochs	1000
Error loss termination value	0.040
Learning rate	0.05
Momentum	0.01

Table 8. The selected features for each attack class in the KDD99 and UNSW-NB15 based on the achieved

A C C

.

Table 8. The selected features for each attack class in the KDD99 and UNSW-NB15 based on the achieved

A C C

.

Dataset	Attack Class	Selected Features	No. of Nodes	$A C C$
KDD99	DoS	36: f_1-1, f_1-2, f_1-3, f_1-4, f_1-6, f_1-7, f_1-9, f_1-10, f_1-11, f_1-12, f_1-13, f_1-14, f_1-15, f_1-16, f_1-17, f_1-18, f_1-22, f_1-23, f_1-24, f_1-25, f_1-26, f_1-27, f_1-28, f_1-29, f_1-30, f_1-33, f_1-34, f_1-35, f_1-36, f_1-37, f_1-38, f_1-39, f_1-40, f_1-41	34	99.40
	Probe	30: f_1-3, f_1-5, f_1-6, f_1-7, f_1-8, f_1-10, f_1-11, f_1-12, f_1-13, f_1-14, f_1-15, f_1-16, f_1-17, f_1-18, f_1-22, f_1-24, f_1-26, f_1-27, f_1-29, f_1-30, f_1-31, f_1-32, f_1-36, f_1-37, f_1-38, f_1-39, f_1-40, f_1-41	20	92.54
	R2L	16: f_1-2, f_1-5, f_1-7, f_1-10, f_1-13, f_1-14, f_1-17, f_1-22, f_1-29, f_1-32, f_1-33, f_1-35, f_1-36, f_1-38, f_1-41	23	85.32
	U2R	24: f_1-1, f_1-2, f_1-3, f_1-4, f_1-5, f_1-7, f_1-8, f_1-11, f_1-12, f_1-16, f_1-17, f_1-18, f_1-19, f_1-24, f_1-25, f_1-28, f_1-30, f_1-31, f_1-33, f_1-34, f_1-36, f_1-37, f_1-39, f_1-41	20	94.14
UNSW-NB15	Fuzzers	18: f_2-3, f_2-6, f_2-7, f_2-9, f_2-10, f_2-11, f_2-12, f_2-15, f_2-18, f_2-20, f_2-27, f_2-31, f_2-34, f_2-35, f_2-36, f_2-39, f_2-41, f_2-42	13	90.40
	Analysis	19: f_2-1, f_2-2, f_2-6, f_2-7, f_2-9, f_2-10, f_2-11, f_2-12, f_2-13, f_2-15, f_2-18, f_2-22, f_2-25, f_2-28, f_2-34, f_2-35, f_2-36, f_2-37, f_2-39	13	86.48
	Backdoors	19: f_2-2, f_2-4, f_2-5, f_2-8, f_2-10, f_2-12, f_2-14, f_2-18, f_2-24, f_2-26, f_2-27, f_2-29, f_2-31, f_2-35, f_2-37, f_2-38, f_2-39, f_2-40, f_2-42	10	89.82
	DoS	12: f_2-1, f_2-2, f_2-7, f_2-8, f_2-10, f_2-11, f_2-19, f_2-25, f_2-26, f_2-29, f_2-38, f_2-41	24	86.57
	Exploits	29: f_2-2, f_2-3, f_2-4, f_2-5, f_2-6, f_2-7, f_2-8, f_2-9, f_2-10, f_2-11, f_2-12, f_2-13, f_2-14, f_2-16, f_2-17, f_2-18, f_2-21, f_2-22, f_2-23, f_2-26, f_2-28, f_2-29, f_2-31, f_2-32, f_2-33, f_2-34, f_2-36, f_2-37, f_2-38	42	87.80
	Generic	10: f_2-3, f_2-9, f_2-11, f_2-17, f_2-20, f_2-23, f_2-24, f_2-32, f_2-36, f_2-38	27	97.97
	Reconnaissance	18: f_2-2, f_2-4, f_2-6, f_2-8, f_2-10, f_2-16, f_2-20, f_2-21, f_2-25, f_2-26, f_2-28, f_2-31, f_2-33, f_2-36, f_2-37, f_2-38, f_2-40, f_2-42	28	89.85
	Shellcode	18: f_2-3, f_2-4, f_2-6, f_2-10, f_2-13, f_2-15, f_2-17, f_2-20, f_2-21, f_2-24, f_2-25, f_2-28, f_2-30, f_2-33, f_2-34, f_2-35, f_2-36, f_2-40	26	90.75
	Worms	29: f_2-1, f_2-2, f_2-3, f_2-5, f_2-6, f_2-7, f_2-9, f_2-10, f_2-11, f_2-12, f_2-13, f_2-14, f_2-15, f_2-16, f_2-21, f_2-22, f_2-24, f_2-25, f_2-26, f_2-27, f_2-28, f_2-29, f_2-31, f_2-32, f_2-34, f_2-36, f_2-38, f_2-40, f_2-41	39	89.22

Table 9. Features selection frequency and ranking for the KDD99.

Features	Runs																				Rank
Features	01	02	03	04	05	06	07	08	09	10	11	12	13	14	15	16	17	18	19	20	Rank
f_1-1	✓	✓	✓		✓	✓	✓				✓	✓	✓		✓		✓	✓	✓	✓	03
f_1-2	✓	✓								✓	✓	✓	✓						✓	✓	34
f_1-3	✓	✓				✓				✓	✓	✓	✓	✓					✓	✓	21
f_1-4	✓		✓			✓	✓	✓		✓	✓	✓		✓	✓			✓		✓	09
f_1-5	✓			✓		✓	✓	✓			✓	✓								✓	34
f_1-6										✓	✓		✓						✓		39
f_1-7	✓					✓	✓				✓	✓	✓	✓			✓		✓		28
f_1-8		✓	✓			✓		✓	✓	✓	✓	✓			✓		✓		✓	✓	09
f_1-9	✓	✓	✓			✓	✓				✓	✓	✓					✓	✓		21
f_1-10	✓				✓	✓		✓		✓	✓	✓	✓	✓		✓		✓	✓	✓	04
f_1-11	✓		✓			✓				✓		✓	✓	✓					✓	✓	28
f_1-12	✓	✓				✓		✓		✓	✓	✓	✓			✓			✓	✓	12
f_1-13	✓						✓			✓	✓	✓					✓	✓	✓		34
f_1-14	✓					✓	✓			✓	✓	✓	✓				✓	✓	✓	✓	12
f_1-15	✓	✓			✓	✓		✓		✓	✓	✓					✓	✓		✓	12
f_1-16	✓				✓	✓			✓	✓	✓	✓	✓				✓		✓		21
f_1-17						✓				✓	✓		✓	✓	✓		✓		✓	✓	28
f_1-18	✓	✓				✓		✓		✓	✓	✓				✓				✓	28
f_1-19	✓	✓				✓	✓	✓		✓		✓	✓	✓				✓	✓		12
f_1-20																					40
f_1-21																					40
f_1-22	✓	✓			✓	✓	✓			✓	✓	✓	✓						✓	✓	12
f_1-23	✓	✓		✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	01
f_1-24	✓	✓	✓			✓	✓	✓	✓				✓					✓	✓	✓	12
f_1-25	✓	✓			✓	✓		✓		✓	✓	✓				✓			✓	✓	12
f_1-26	✓	✓				✓	✓	✓			✓	✓	✓					✓	✓	✓	12
f_1-27	✓	✓	✓			✓	✓	✓		✓	✓					✓			✓		21
f_1-28		✓				✓			✓		✓		✓				✓	✓	✓	✓	28
f_1-29	✓	✓			✓	✓	✓	✓	✓	✓	✓	✓	✓	✓			✓	✓	✓	✓	02
f_1-30	✓	✓	✓		✓	✓	✓			✓	✓	✓		✓			✓	✓	✓		04
f_1-31	✓	✓	✓	✓			✓		✓	✓	✓	✓	✓		✓			✓	✓		04
f_1-32	✓	✓	✓			✓	✓	✓		✓	✓	✓	✓				✓	✓		✓	04
f_1-33	✓	✓	✓			✓	✓	✓		✓	✓		✓							✓	21
f_1-34		✓		✓		✓	✓				✓	✓	✓				✓		✓	✓	21
f_1-35	✓	✓				✓	✓			✓									✓	✓	38
f_1-36	✓				✓	✓	✓	✓			✓		✓			✓			✓	✓	21
f_1-37	✓	✓	✓			✓	✓		✓	✓	✓	✓					✓		✓		12
f_1-38	✓	✓		✓		✓		✓	✓		✓	✓	✓	✓				✓	✓		09
f_1-39	✓	✓			✓	✓	✓	✓										✓	✓	✓	28
f_1-40	✓	✓			✓	✓				✓			✓						✓	✓	34
f_1-41	✓	✓	✓			✓	✓	✓		✓	✓		✓				✓	✓	✓	✓	04
Count	34	28	13	05	12	34	24	20	08	28	33	28	27	11	06	07	16	19	33	28
$A C C$	94.02	97.82	94.04	95.77	96.60	97.98	97.47	97.13	98.38	96.39	97.82	95.00	96.84	97.27	97.91	97.25	96.59	97.97	98.42	98.09

Table 10. Features selection frequency and ranking for the UNSW-NB15.

Features	Runs																				Rank
Features	01	02	03	04	05	06	07	08	09	10	11	12	13	14	15	16	17	18	19	20	Rank
f_2-1							✓			✓	✓			✓	✓					✓	29
f_2-2	✓	✓		✓	✓		✓				✓	✓		✓		✓		✓			06
f_2-3						✓	✓			✓				✓						✓	36
f_2-4			✓			✓	✓			✓			✓	✓				✓	✓	✓	13
f_2-5						✓	✓			✓		✓	✓		✓			✓		✓	17
f_2-6		✓			✓					✓		✓		✓	✓						29
f_2-7					✓		✓	✓		✓		✓	✓	✓							22
f_2-8	✓						✓	✓		✓		✓				✓			✓		22
f_2-9		✓							✓	✓	✓	✓	✓			✓		✓	✓	✓	06
f_2-10	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	01
f_2-11				✓	✓		✓		✓	✓		✓	✓	✓	✓	✓	✓		✓		03
f_2-12							✓	✓		✓	✓		✓	✓			✓				22
f_2-13								✓				✓		✓		✓			✓	✓	29
f_2-14					✓	✓	✓						✓			✓		✓	✓	✓	17
f_2-15				✓	✓		✓			✓	✓		✓	✓				✓	✓		13
f_2-16		✓		✓			✓	✓			✓		✓	✓	✓		✓		✓	✓	04
f_2-17		✓					✓			✓	✓	✓	✓			✓		✓	✓	✓	06
f_2-18				✓	✓	✓		✓		✓			✓					✓	✓	✓	13
f_2-19	✓				✓	✓	✓			✓			✓				✓				22
f_2-20																					42
f_2-21					✓								✓		✓						41
f_2-22					✓									✓				✓	✓	✓	36
f_2-23			✓		✓							✓		✓		✓					36
f_2-24	✓					✓		✓		✓			✓			✓			✓	✓	17
f_2-25					✓	✓				✓				✓						✓	36
f_2-26		✓						✓	✓	✓	✓	✓	✓	✓	✓					✓	06
f_2-27					✓		✓	✓		✓	✓		✓	✓		✓		✓	✓	✓	04
f_2-28	✓				✓		✓	✓		✓		✓	✓	✓				✓		✓	06
f_2-29	✓				✓	✓	✓	✓	✓	✓		✓				✓	✓	✓	✓	✓	02
f_2-30	✓					✓				✓					✓		✓				36
f_2-31							✓	✓		✓			✓	✓				✓	✓		22
f_2-32	✓						✓				✓	✓	✓	✓			✓		✓		17
f_2-33							✓						✓	✓	✓		✓			✓	29
f_2-34							✓			✓			✓	✓		✓		✓	✓		22
f_2-35	✓	✓						✓		✓	✓			✓	✓	✓		✓			13
f_2-36	✓					✓		✓					✓	✓	✓	✓	✓	✓		✓	06
f_2-37				✓						✓			✓	✓		✓				✓	29
f_2-38										✓		✓	✓	✓		✓	✓	✓	✓		17
f_2-39	✓			✓	✓		✓			✓	✓	✓			✓				✓	✓	06
f_2-40					✓		✓			✓	✓		✓							✓	29
f_2-41					✓		✓			✓		✓	✓						✓		29
f_2-42							✓		✓	✓	✓		✓	✓	✓						22
Count	12	08	03	08	19	12	26	15	06	31	15	18	28	27	14	17	11	18	21	23
$A C C$	84.27	84.54	84.48	86.65	81.01	86.19	90.61	85.01	92.13	90.92	85.76	92.19	90.57	89.35	91.28	91.98	92.12	86.07	89.16	84.93

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Al-Daweri, M.S.; Zainol Ariffin, K.A.; Abdullah, S.; Md. Senan, M.F.E. An Analysis of the KDD99 and UNSW-NB15 Datasets for the Intrusion Detection System. Symmetry 2020, 12, 1666. https://doi.org/10.3390/sym12101666

AMA Style

Al-Daweri MS, Zainol Ariffin KA, Abdullah S, Md. Senan MFE. An Analysis of the KDD99 and UNSW-NB15 Datasets for the Intrusion Detection System. Symmetry. 2020; 12(10):1666. https://doi.org/10.3390/sym12101666

Chicago/Turabian Style

Al-Daweri, Muataz Salam, Khairul Akram Zainol Ariffin, Salwani Abdullah, and Mohamad Firham Efendy Md. Senan. 2020. "An Analysis of the KDD99 and UNSW-NB15 Datasets for the Intrusion Detection System" Symmetry 12, no. 10: 1666. https://doi.org/10.3390/sym12101666

APA Style

Al-Daweri, M. S., Zainol Ariffin, K. A., Abdullah, S., & Md. Senan, M. F. E. (2020). An Analysis of the KDD99 and UNSW-NB15 Datasets for the Intrusion Detection System. Symmetry, 12(10), 1666. https://doi.org/10.3390/sym12101666

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Analysis of the KDD99 and UNSW-NB15 Datasets for the Intrusion Detection System

Abstract

1. Introduction

2. Datasets’ Description and Properties

3. Methodology

3.1. Rough-Set Theory (RST)

3.2. Back-Propagation Neural Network (BPNN)

3.3. Discrete Cuttlefish Algorithm (D-CFA)

3.3.1. Initialization Phase (Lines 1–4 of Algorithm 1)

3.3.2. Improvement Phase (Lines 5–29 of Algorithm 1)

4. Results and Discussions

4.1. Calculating the Lower Approximations and Dependencies of the Features

4.2. Classification Accuracy Analysis: Examining the Features for the Detection of Each Attack

4.3. The Most Frequently Selected Features Using the D-CFA

5. Conclusions and Future Work

Author Contributions

Funding

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI