You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Article
  • Open Access

14 March 2019

Features Dimensionality Reduction Approaches for Machine Learning Based Network Intrusion Detection

,
,
,
and
1
Department of Computer Science & Engineering, University of Bridgeport, Bridgeport, CT 06604, USA
2
Department of Biomedical Engineering, University of Bridgeport, Bridgeport, CT 06604, USA
*
Author to whom correspondence should be addressed.
This article belongs to the Section Computer Science & Engineering

Abstract

The security of networked systems has become a critical universal issue that influences individuals, enterprises and governments. The rate of attacks against networked systems has increased dramatically, and the tactics used by the attackers are continuing to evolve. Intrusion detection is one of the solutions against these attacks. A common and effective approach for designing Intrusion Detection Systems (IDS) is Machine Learning. The performance of an IDS is significantly improved when the features are more discriminative and representative. This study uses two feature dimensionality reduction approaches: (i) Auto-Encoder (AE): an instance of deep learning, for dimensionality reduction, and (ii) Principle Component Analysis (PCA). The resulting low-dimensional features from both techniques are then used to build various classifiers such as Random Forest (RF), Bayesian Network, Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) for designing an IDS. The experimental findings with low-dimensional features in binary and multi-class classification show better performance in terms of Detection Rate (DR), F-Measure, False Alarm Rate (FAR), and Accuracy. This research effort is able to reduce the CICIDS2017 dataset’s feature dimensions from 81 to 10, while maintaining a high accuracy of 99.6% in multi-class and binary classification. Furthermore, in this paper, we propose a Multi-Class Combined performance metric C o m b i n e d M c with respect to class distribution to compare various multi-class and binary classification systems through incorporating FAR, DR, Accuracy, and class distribution parameters. In addition, we developed a uniform distribution based balancing approach to handle the imbalanced distribution of the minority class instances in the CICIDS2017 network intrusion dataset.

1. Introduction

Network Intrusion Detection System (IDS) is a software-based application or a hardware device that is used to identify malicious behavior in the network [1,2]. Based on the detection technique, intrusion detection is classified into anomaly-based and signature-based. IDS developers employ various techniques for intrusion detection. One of these techniques is based on machine learning. Machine learning (ML) techniques can predict and detect threats before they result in major security incidents [3]. Classifying instances into two classes is called binary classification. On the other hand, multi-class classification refers to classifying instances into three or more classes. In this research, we adopt both classifications. For the multi-class classification, there are 15 classes, where each class represents either normal network flow traffic or one of 14 types of attacks. For the binary case, the network flow traffic is being classified into either normal or anomaly (attack) traffic.
An Artificial Neural Network (ANN) is a self-adaptive mathematical and computational model that is composed of an interconnected group of artificial neurons. There are multiple types of ANNs such as Deep Convolution Neural Networks (DCNN), Recurrent Neural Networks (RNN) and Auto-Encoder (AE) neural networks, each of which come with their own specific applications and levels of complexity. Deep learning is a promising machine learning-based approach that can address the challenges associated with the design of intrusion detection systems as a result of its outstanding performance in dealing with complex, large-scale data.
This study accustoms Auto-Encoder (AE) and Principle Component Analysis (PCA) for dimensionality reduction. As a proof-of-concept and to verify the feature dimensionality reduction ideas, the paper used the up-to-date CICIDS2017 intrusion detection and prevention dataset [4], which consists of five separated data files. Each file represents the network traffic flow and specific types of attacks for a certain period of time. To be more specific, the dataset was collected based on a total of 5 days, Monday through Friday. The traffic flow on Monday includes the benign network traffic, whereas the implemented attacks in the dataset were executed on Tuesday, Wednesday, Thursday and Friday. In this paper, we combined all CICIDS2017’s files together and fed them through the AE and PCA units for a compressed and lower dimensional representation of all the fused data. Figure 1 displays the overall idea of the proposed framework.
Figure 1. Proposed Framework.

1.1. Problem Statement

In machine learning problems, the high-dimensional features lead to prolonged classification processes. This is while low-dimensional features can reduce these processes. Moreover, classification of network traffic data with imbalanced class distributions has posed a significant drawback on the performance attainable by most well-known classifiers, which assume relatively balanced class distributions and equal miss-classification costs. The frequent occurrence and issues associated with imbalanced class distributions indicate the need for extra research efforts. Previous studies of intrusion detection systems have not dealt with classification of network traffic data with imbalanced class distributions. Furthermore, with the presence of imbalanced data, the known performance metrics may fail to provide adequate information about the performance of the classifier.

1.2. Key Contributions and Paper Organization

The key contributions of this paper include the development of a framework for machine learning-based network intrusion detection. The proposed anomaly-based intrusion detection system uses AE as well as PCA for dimensionality reduction and well-tested classifiers such as Random Forest (RF), Bayesian Network (BN), Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA). In summary, the main contributions of this work are as follows:
  • We achieved effective pattern representation and dimensionality reduction of features in the CICIDS2017 dataset using AE and PCA.
  • We used the CICIDS2017 dataset to compare the efficiency of the dimensionality reduction approaches with different classification algorithms, such as Random Forest, Bayesian Network, LDA and QDA in binary and multi-class classification.
  • We developed a combined metric with respect to class distribution to compare various multi-class classification systems through incorporating the False Alarm Rate (FAR), Detection Rate (DR), Accuracy and class distribution parameters.
  • We developed a Uniform Distribution Based Balancing (UDBB) approach for imbalanced classes.
The overall structure of the remainder of this paper is organized as follows. An overview of the dimensionality reduction approaches selection criteria and related work is provided in Section 2. Next, in Section 3, the paper gives a brief review of the CICIDS2017 dataset, describes the attack types embedded in the dataset, and further explains the preprocessing and unity-based normalization steps. In Section 4, the paper explains in detail, the dimensionality reduction approaches based on AE as well as PCA. Afterwards, the performance evaluation metrics are introduced in Section 5. Section 6 elaborates on the Uniform Distribution Based Balancing (UDBB) approach. Next, in Section 7, the paper summarizes the principal findings of the experiments and discusses the results. The challenges and limitations are discussed in Section 8. Finally, the conclusions and future directions are discussed in Section 9.

3. CICIDS2017 Dataset

The CICIDS2017 dataset consists of realistic background traffic that represents the network events produced by the abstract behavior of a total of 25 users. The users’ profiles were determined to include specific protocols such as HTTP, HTTPS, FTP, SSH and email protocols. The developers used statistical metrics such as minimum, maximum, mean and standard deviation to encapsulate the network events into a set of certain features which include:
  • The distribution of the packet size
  • The number of packets per flow
  • The size of the payload
  • The request time distribution of the protocols
  • Certain patterns in the payload
Moreover, CICIDS2017 covers various attack scenarios that represent common attack families. The attacks include Brute Force Attack, HeartBleed Attack, Botnet, DoS Attack, Distributed DoS (DDoS) Attack, Web Attack, and Infiltration Attack.
The dataset is publicly available by the authors in two formats:
  • The full packet payloads in Packet CAPture (PCAP) format
  • The corresponding profiles and labeled flows as CSV files for machine and deep learning purposes
CICIDS2017 was collected based on real traces of benign and malicious activities of the network traffic. The total number of records in the dataset is 2,830,108. The benign traffic encompasses 2,358,036 records (83.3% of the data), while the malicious records are 471,454 (16.7% of the data). CICIDS2017 is one of the unique datasets that includes up-to-date attacks. Furthermore, the features are exclusive and matchless in comparison with other datasets such as UNSW-NB15 [30,31], AWID [32], GPRS [33], and CIDD-001 [34]. For this reason, CICIDS2017 was selected as the most comprehensive IDS benchmark to test and validate the proposed ideas. Table 2 highlights the characteristics and distribution of the attacks in the CICIDS2017 dataset and provides a brief description of each type of attack. CICIDS2017 is a labeled dataset with a total number of 84 features including the last column corresponding to the traffic status (class label). The features were extracted by CICFlowMeter-V3 [35]. The output of CICFlowMeter-V3 is a CSV file that includes: Flow ID (1), Source IP (2) and Destination IP (4), Time stamp (7) and Label (84). The Flow ID (1) includes the four tuples: Source IP, Source Port, Destination IP, and Destination Port. Time stamp represents the timing. To the best of our knowledge, all previous studies that used CICIDS2017 neglect Flow ID (1), Source IP (2), Destination IP (4), and Time stamp (7). In this paper, we used CICIDS2017 with respect to the listed features except the Flow ID (1) and Time Stamp (7). Thus, in our study, the total number of used features encompasses 82 features including the Label (84). These features are listed in Table 3. The extracted traffic features are explained in [36].
Table 2. CICIDS2017 attack distribution and description.
Table 3. Listed features of network traffic in CICIDS2017.

3.1. Preprocessing

In this study, a preprocessing function is applied to the CICIDS2017 dataset by mapping the IP (Internet Protocol) address to an integer representation. The mapped IP includes the Source IP Address (Src IP) as well as the Destination IP Address (Dst IP). These two are converted to an integer number representation. This study splits the data into training set and testing set with a ratio of 70:30.

3.2. Unity-Based Normalization

In this step, we use Equation (1) to re-scale the features in the dataset based on the minimum and maximum values of each feature. Some features in the original dataset vary between [0, 1] while other features vary between [0, ). Therefore, these features are normalized to restrict the range of the values between 0 and 1, which are then processed by the auto-encoder for feature reduction.
x i = x i x m i n x m a x x m i n
where x i is the value of a particular feature, x m i n is the minimum value, and x m a x is the maximum value.

4. Features Dimensionality Reduction

4.1. Auto-Encoder (AE) Based Dimensionality Reduction

In this section, we present the sparse auto-encoder learning algorithm [37,38], which is one approach to automatically learn feature reduction in unsupervised settings. Figure 2 shows the structure of the auto-encoder. The input vector x = ( x 1 , x 2 , , x n ) is first compressed to a lower dimensional hidden representation that consists of one or more hidden layers a = ( a 1 , a 2 , , a m ) . The hidden representation a is then mapped to reproduce the output x ^ = ( x 1 ^ , x 2 ^ , , x n ^ ) . Let j be the counter parameter for the neurons in the current layer l, and i be the counter parameter for the neurons in the previous hidden layer l 1 . The output of a neuron in the hidden layer can be represented by the following formula.
a j ( l ) = f ( z j ( l ) ) = f ( i = 1 n W j i ( l 1 ) . a i ( l 1 ) + b j ( l 1 ) )
Figure 2. The structure of an AE.
The size of the weight matrix of the hidden layer is represented by W R m × n and the bias is b R m . A sigmoid function is chosen as the activation function, such that f ( z ) = 1 ( 1 + e x p z ) . Parameters W and b are optimized using back propagation, by minimizing the cost function J for all the training instances [39], as follows:
J ( W , b ; x ^ , x ) = 1 k i = 1 k ( 1 2 | | x ^ x | | 2 ) + 1 λ l = 1 l s 1 j = 1 m i = 1 n ( W j i ( l ) ) 2
Parameter λ is chosen to control the regularization term of all the weights in a particular layer, and l s denotes the total number of layers. To impose a sparsity constraint on the hidden units, one strategy is to add an additional term in the loss function during training to penalize the Kullback-Leibler (KL) divergence between a Bernoulli random variable with mean ρ and a desired sparsity mean ρ j ^ .
ρ j ^ = 1 k i = 1 k a j ( i ) ( x ( i ) )
where a j ( i ) denotes the activation of hidden unit j in the auto-encoder and k is the training sample [40].
J s p a r s e W , b = J ( W , b ; x ^ , x ) + β j = 1 m K L ρ | | ρ j ^
This sparsity is guaranteed to have the effect of causing ρ j ^ to be close to ρ , because it ensures that the sparse activations are achieved on the training data for any given units in the hidden layer. The value of β is chosen to control the weight of the sparsity penalty term.
The computational complexity of executing the designed auto-encoder with a single hidden layer depends on the dimensionality of the input vector n, and the Reduction ratio R ( 0 , 1 ) [41].
O ( n . ( R × n ) + ( R × n ) . n ) = O ( R n 2 + R n 2 ) = O ( n 2 )
In this paper, a two hidden-layer sparse auto-encoder is used with sigmoid activation functions and tied weights. The input layer has 81 neurons which equals the total number of features in the CICIDS2017 dataset. The first hidden layer of the sparse auto-encoder was able to successfully reduce the dimensions to 70 features with a good error approximation. Further, the features were reduced to 64 in the second hidden layer. Once the weights are trained, the resulting sparse auto-encoder can be used to perform the classification in the final stage. The parameters of the sparse representation are set as follows: the weight decay λ = 0.0008 . The weights are multiplied by λ to prevent the weights from growing too large. The sparsity parameter ρ = 0.05 , and the sparsity penalty term β = 6 . The sparsity parameters and penalty are designed to restrict the activation of the hidden units, which reduces the dependency between the features. The algorithm is summarized in Table 4 and the design principles are presented in Table 5.
Table 4. Pseudo-code for the proposed Auto-Encoder.
Table 5. Design Principles.

4.2. Principle Component Analysis (PCA) Based Dimensionality Reduction

In this section, we present the Principle Component Analysis (PCA) algorithm. The objective of PCA is to perform dimensionality reduction. PCA finds a transformation that reduces the dimensionality of the data while accounting for as much variance as possible. PCA is the oldest technique in multivariate analysis. The fundamental concept of the PCA is the projection-based mechanism. Here, the original dataset X R n with n columns (features) is projected into a subspace with k or lower dimensions representation X R K (fewer columns), while retaining the essence of the original data. The algorithm works as follows:
To reduce the features dimensionality from n-dimensions to k-dimensions, two phases are implemented; the preprocessing phase and the dimensionality reduction phase. In the preprocessing phase, (steps 1 through 4 below), the data is preprocessed to normalize its mean and variance using Equations (7) and (8). In the second phase (steps 5 through 8), which represent the reduction phase, the covariance matrix C o v M , Eigen-vectors and Eigen-values are calculated from Equations (9) and (10).
  • Normalize the the original feature values of data by its mean and variance using Equation (7), where m is the number of instances in the dataset and X ( i ) are the data points.
    μ = 1 m i = 1 m X ( i )
  • Replace X ( i ) with X ( i ) μ .
  • Rescale each vector X j ( i ) to have unit variance using Equation (8).
    σ j 2 = 1 m i ( X j ( i ) ) 2
  • Replace each X j ( i ) with X j ( i ) σ .
  • Compute the Covariance Matrix C o v M as follows:
    C o v M = 1 m ( X ( i ) ) ( X ( i ) )
  • Calculate the Eigen-vectors and corresponding Eigen-values of C o v M .
  • Sort the Eigen-vectors by decreasing the Eigen-values and choose k Eigen-vectors with the largest Eigen-values to form W.
  • Use W to transform the samples onto the new subspace using Equation (10).
    y = W T × X
    where X is a d × 1 dimensional vector representing one sample, and y is the transformed k × 1 dimensional sample in the new subspace.
The computational complexity of executing the designed PCA depends on the number of features P that represent each data point [42].
O ( P 3 )
According to [28], the Reduction Ratio (RR) of PCA can be defined as the ratio of the number of target dimensions to the number of original dimensions. The lower the value of RR, the higher is the efficiency of PCA. The RR of our proposed framework is equal to 10:81 which outperformed previous related work. Our final RR of 2:81 is also able to represent the data with low error and provide high accuracies.

5. Performance Evaluation Metrics

This study used various performance metrics to evaluate the performance of the proposed system, including False Alarm Rate (FAR), F-Measure [43], Detection Rate (DR), and Accuracy (Acc) as well as the processing time. The definitions of these metrics are provided below. The metrics are a function of True Positives (TP), True Negatives (TN), False Positives (FP) and False Negatives (FN).
(1)
False Alarm Rate (FAR) is a common term which encompasses the number of normal instances incorrectly classified by the classifier as an attack, and can be estimated through Equation (12).
F A R = F P T N + F P
(2)
Accuracy (Acc) is defined as the ability measure of the classifier to correctly classify an object as either normal or attack. The Accuracy can be defined using Equation (13).
A c c = T P + T N T P + T N + F P + F N
(3)
Detection Rate (DR) indicates the number of attacks detected divided by the total number of attack instances in the dataset. DR can be estimated by Equation (14).
D R = T P T P + F N
(4)
The F-measure (F-M) is a score of a classifier’s accuracy and is defined as the weighted harmonic mean of the Precision and Recall measures of the classifier. F-Measure is calculated using Equation (15).
F M e a s u r e = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l
(5)
Precision represents the number of positive predictions divided by the total number of positive class values predicted. It is considered as a measure for the classifier exactness. A low value indicates large number of False Positives. The precision is calculated using Equation (16).
P r e c i s i o n = T P T P + F P
(6)
Recall is the number of True Positives divided by the number of True Positives and the number of False Negatives. Recall is considered as a measure of a classifier completeness such that a low value of recall realizes many False Negatives [44]. Recall is estimated through Equation (17).
R e c a l l = T P T P + F N

Proposed Multi-Class Combined Performance Metric with Respect to Class Distribution

In general, the overall accuracy is used to measure the effectiveness of a classifier. Unfortunately, in presence of imbalanced data, this metric may fail to provide adequate information about the performance of the classifier. Furthermore, the method is very sensitive to the class distribution and might be misleading in some way. Hamed et al. [45] proposed a combined performance metric to compare various binary classifier systems. However, their solution neglects class distribution and can work only for binary classifications.
In this paper, we propose the multi-class combined performance metric C o m b i n e d M c with respect to class distribution to compare various multi-class classification systems as well as binary class systems through incorporating four metrics together (FAR Equation (12), Accuracy Equation (13), Detection Rate Equation (14), and class distribution Equation (19). The multi-class Combined performance metric can be estimated using the following equation.
C o m b i n e d M c = i = 1 C λ i A c c i + D R i 2 F A R i
where C is number of classes, and λ i is the class distribution ( d i s t ), which can be estimated using the following formula.
d i s t = λ i = N u m b e r o f i n s t a n c e s i n c l a s s i N u m b e r o f i n s t a n c e s i n t h e d a t a s e t
The result of this metric will be a real value between −1 and 1; that is C o m b i n e d M c [ 1 , + 1 ] ; where 1 corresponds to the worst overall system performance and 1 corresponds to the best overall system performance. Table 6 illustrates the pseudo-code for calculating this proposed combined metric.
Table 6. Pseudo-code for the proposed C o m b i n e d M c metric calculation.

6. Uniform Distribution Based Balancing (UDBB)

The problem of learning from skewed multi-class datasets is an important topic that arises very often in practice in classification problems. In such problems, almost all the instances are labeled as one class (called the majority, or negative class), while far fewer instances are labeled as the other class or classes (often called the minority class(es), or positive class(es)); usually the more important class(es). This section provides a glance at the Uniform Distribution Based Balancing (UDBB) technique. UDBB is based on learning and sampling probability distributions [46]. In this technique, the sampling of instances is performed following a distribution learned for each pair example of feature and class label. More specifically, the user determines the uniform distribution balancing to learn in order to re-sample new instances.
According to [44], the Imbalance Ratio (IR) can be defined as the ratio of the number of instances in the majority class to the number of instances in the minority class, as presented in Equation (20).
I m b a l a n c e R a t i o = M a j o r i t y C l a s s I n s t a n c e s M i n o r i t y C l a s s I n s t a n c e s
For the CICIDS2017 dataset, IR is 5:1 and the total number of classes is 15 classes. To apply UDBB, a uniform number of instances ( I R e s a m p l e ) for each class is calculated from Equation (21).
I R e s a m p l e = N u m b e r o f I n s t a n c e s i n t h e d a t a s e t N u m b e r o f C l a s s e s i n t h e d a t a s e t
The literature indicates that imbalanced class distribution is a major hurdle. If the IR value in the data is high, classifiers will be lower in accuracy and reliability; i.e., they do not truly reflect the classes accurately. Furthermore, imbalanced class distributions is an inevitable problem in real network traffic due to the large size of traffic and low frequency of certain types of anomalies. One of the recent attempts to address this problem appears in [47]. The authors used sampling approaches to combat imbalanced class distributions for network intrusion detection.
Previous developers that used CICIDS2017, used files that were relevant to Tuesday through Friday. In this paper, we use these along with the data of Monday by merging all files together in a single combined file. The motivation behind this step is to acquire a large volume of both data size and number of instances with skewed data towards normal traffic and up-to-date attack patterns. Table 7 presents a pseudo-code for the UDBB technique. In addition, our study compares between the imbalanced case (with original distribution of CICIDS2017) and balanced class distribution (after applying the uniform distribution-based balancing approach).
Table 7. UDBB pseudo-code.

7. Results and Discussion

In this section, we present the principal findings of the proposed framework.Extensive simulations have been performed.

7.1. Preliminary Assumptions and Requirements

All the simulations were carried out using an Intel-Core i7 with 3.30 GHz and 32 GB RAM, running Windows 10. Our main hypothesis is that reduced features dimensions representation in machine learning-based IDS will reduce the time and memory complexity compared to the original features dimensions, while still maintaining high performance (not negatively impacting the achieved accuracy). Another hypothesis claim is that the proposed balancing technique improves the data representation of imbalanced classes and thus, improves the classification performance compared to the original class distributions. The results highlight the advantages of feature dimensionality reduction on CICIDS2017 as well as the effectiveness of the balancing approach to prove the hypothesis claims.
From the research efforts in this work, we were able to reduce the dimensionality of the features in CICIDS2017 from 81 features to 10 features while maintaining a high accuracy in multi-class and binary class classification using the Random Forest classifier. The findings are discussed in following subsections.

7.2. Binary class Classification

The study evaluates the performance of binary classification in terms of Acc, FAR, DR and F-M. Table 8 and Table 9 display the summary of the results obtained. Table 8 highlights the results of the dimensionality reduction of the features in CICIDS2017 from 81 features to 10 features obtained using PCA, whereas Table 9 displays the results of the dimensionality reduction of the features in CICIDS2017 from 81 features to 59 features using AE.
Table 8. Performance evaluation of the proposed framework in binary classification using PCA.
Table 9. Performance evaluation of the proposed framework in binary classification using AE.
The DR metric revealed that ( P C A R F ) B c 10 is able to detect 98.8% of the attacks. In the same manner, ( P C A R F ) B c 10 achieved an F-Measure of 0.997. Moreover, ( A E R F ) B c 59 is able to detect 98.5% of the attacks.
Figure 3 highlights the achieved detection rates resulted from the dimensionality reduction using PCA, whereas, Figure 4 shows the achieved detection rate using the reduced features set by AE. From Figure 3 and Figure 4, it is apparent that Random forest, QDA and Bayesian Network reported significantly higher detection rates than the LDA for the reduced feature dimensionality of CICIDS2017 using the PCA approach. The results from the classification using different classifiers assures that our reconstructing of new feature representation was good enough to achieve an overall accuracy of 98.5% with 59 features in binary classification using Random Forest from AE.
Figure 3. Binary Class Classification: Detection Rate in terms of number of components using PCA.
Figure 4. Binary Class Classification: Detection Rate in terms of number of features using Auto-Encoder.

7.3. Multi-Class Classification

The study used the Acc, F-M, FPR, TPR, Precision, Recall, and the Combined multi-class metrics to evaluate the performance of multi-class classification. Table 10 and Table 11 display the summary of the results obtained for the dimensionality reduction of the features for CICIDS2017 from 81 to 10 using PCA, and from 81 to 59 using AE, respectively.
Table 10. Performance evaluation of the proposed framework in multi-class classification using PCA.
Table 11. Performance evaluation of the proposed framework in multi-class classification using AE.
Figure 5 presents the resulting accuracies in terms of the number of principle components. What is striking about the resulting accuracies in Figure 5 is that the Random Forest classifier shows a constantly high accuracy for reduced features from 81 through 10. In contrast, the resulting accuracies of LDA and QDA cases were oscillatory. For QDA, the accuracy is wobbling between 66% with 10 features and 96.7% with 60 features. For LDA with 10 and 40 features, the accuracy is fluctuating between 85% and 96.6%, respectively.
Figure 5. Multi Class Classification: Accuracy in terms of number of components using PCA.
The results of the AE dimensionalty reduction approach are displayed in Figure 6. The observed accuracy for Random Forest is significant compared to LDA, QDA and the Bayesian Network classifiers. Furthermore, what stands out in this Figure 6, is the increase of the resulting accuracy for LDA for the reduced dimensionality from 81 through 59 features. Here, the AE reconstructed a new and reduced feature representation pattern that reflects the original data with minimum error. Unlike features selection techniques where the set of features made by feature selection is a subset of the original set of features that can be identified precisely, AE generated new features pattern with reduced dimensions.
Figure 6. Multi Class Classification: Accuracy in terms of number of features using AE.
A detailed analysis summary of the proposed framework in terms of False Positive Rate (FPR), True Positive Rate (TPR), Precision and Recall are tabulated in Table 12 and Table 13. Table 12 depicts the results with 10 features (before applying UDBB), while Table 13 shows the results using 10 features (after applying UDBB). The weighted average result for all the attacks are presented in bold.
Table 12. Performance evaluation before applying UDBB.
Table 13. Performance evaluation after applying UDBB.
The results confirmed that the proposed framework with the reduced feature dimensionality achieved a maximum precision value of 0.996 and an FPR of 0.010, confirming the efficiency and effectiveness of the intrusion detection process. However, ( P C A R F ) M c 10 is unable to detect the HeartBleed attacks (noted as NAN in Table 12). In this Table, the Recall and Precision values for HeartBleed and WebAttack:SQL are 0.00, 0.000 and 0.000, 0.000, respectively. A justification of such outcome could be due to the fact that the number of instances of HeartBleed and WebAttack:SQL originally embedded in CICIDS2017 is equal to 11 and 21, respectively. This is expected, since the total number of HeartBleed instances in the original dataset is 11 instances. Thus, these instances were miss-classified by the classifier. To resolve this issue and to assure that the achieved accuracy is reflected due to the effective reduction approach, this paper applies the uniform distribution-based balancing technique to overcome the imbalanced class distributions of certain attacks in CICIDS2017. Table 14 shows the performance before and after applying the UDBB approach. As observed, ( P C A R F ) M c 10 achieved 99.6% and 98.8% before and after applying UDBB, respectively. In the same manner, ( P C A Q D A ) M c 10 achieved 85.6% and 98.9% before and after applying UDBB, respectively. The highest achieved F-M was obtained by ( P C A Q D A ) M c 10 . However, the highest C M ( M c ) achieved was 98.6% by ( P C A R F ) M c 10 .
Table 14. Performance evaluation of ( P C A X ) M c 10 .
The performance evaluation of ( P C A X ) B c 10 and ( P C A X ) M c 10 in terms of the time to build and test the model is presented in Table 15 (X represents the classifier). The lowest times to test the model were achieved by LDA with 2.96 s for multi-class and 5.56 s for binary class classification.
Table 15. Time to build and test the models.
Here, the Random Forest classifier that has the best detection performance, comes with the highest overhead in terms of the time to build and test the model. The fundamental notion behind Random Forest is that it combines many decision trees into a single model and specifically in this work, the dataset has over 2.5 million instances in total. This is expected since the worst case time complexity of Random Forest is estimated using Equation (22) [48].
O ( M K N 2 l o g N )
where K is the number of trees, M is the number of variables used in each split, and N is the number of training samples.
Moreover, a visualization of the dataset with two PCA components before and after applying the distribution-based balancing approach is displayed in Figure 7 and Figure 8.
Figure 7. 2D Visualization of PCA on CICIDS2017 with original distribution.
Figure 8. 2D Visualization of PCA on CICIDS2017 with UDBB.
This observation of the CICIDS2017 dataset visually represents how the instances are set apart. As displayed in Figure 8, the same type of instances were positioned (clustered) together in groups. This shows a significant improvement over the PCA visualization before applying UDBB. Here, the normal instances are very clearly clustered in their own group. This is applied for other types of instances as well.
The confusion matrix for the P C A R F M c 10 is shown in Figure 9. The value for HeartBleed is reported as NAN (Not A Number). These values result from operations which have undefined numerical values. The classifier P C A R F M c 10 fails to classify HeartBleed attacks. In contrast, as a result of applying the UDBB technique, the P C A R F M c 10 is able to detect 100% of HeartBleed attacks, as indicated from the confusion matrix in Figure 10.
Figure 9. Confusion Matrix for ( P C A R F ) M c 10 with original class distribution.
Figure 10. Confusion Matrix for ( P C A R F ) M c 10 with UDBB.
A comparison between the proposed framework and related work is highlighted in Table 16. The authors in [17,49,50] reported the accuracy. Our proposed framework outperforms previous studies in terms of F-Measure and accuracy.
Table 16. A comparison of the proposed framework and previous studies.

8. Challenges and Limitations

Although this study has successfully demonstrated the significance of the feature dimensionality reduction techniques which led to better results in terms of several performance metrics as well classification speeds for an IDS, it has certain limitations and challenges which are summarized as follows.

8.1. Fault Tolerance

Fault tolerance enables a system to continue operating properly in the event of failure or faults within any of its components. Fault tolerance can be achieved through several techniques. One aspect of fault tolerance in our system is the ability of the designed approach to detect a large set of well-known attacks. Our models have been trained to detect the 14 up-to-date and well-known type of attacks. Furthermore, fault tolerance can be achieved by adopting the majority voting technique [52]. The trained models of Random Forest, Bayesian Network, and LDA can be used in a majority voting-based intrusion detection system that can adapt fault tolerance. Moreover, the deployment of distributed intrusion detection systems in the network can enable fault tolerance.

8.2. Adaption to Non-Stationary Traffic/Nonlinear Models

The AE has the ability to represent models that are linear and nonlinear. Moreover, once the model is trained, it can be used for non-stationary traffic. We intend to further extend our work in the future with an online anomaly-based intrusion detection system.

8.3. Model Resilience

As presented in Table 12 and Table 13, the achieved FP rate is 0.010 and 0.001 respectively, which may reflect a built-in attack resiliency. Moreover, our models were trained in an offline manner. This ensures that an adversary cannot inject misclassified instances during the training phase. On the contrary, such case could occur with online-trained models. Therefore, it is essential for the machine learning system employed in intrusion detection to be resilient to adversarial attacks [53]. An approach to quantify the resilience of machine learning classifiers was introduced in [53]. The association of these factors will be investigated in future studies.

8.4. Ease of Dataset Acquisition/Model Building

The data used for our IDS model was acquired from the CICIDS2017 dataset which is a publicly available dataset provided by the Canadian Institute for Cybersecurity [35,36]. The dataset is open source and available for download and sharing.

8.5. Quality of Experience

According to [54], the Quality of Experience is used to measure and express, preferably as numerical values, the experience and perception of the users with a service or application software. The current research was not specifically designed to evaluate factors related to Quality of Experience. Future directions of this research may include such investigations.

9. Conclusion and Future Work

The aim of this research was to examine incorporating auto-encoder and PCA for dimensionality reduction and the use of classifiers towards designing an efficient network intrusion detection system on the CICIDS2017 dataset. The experimental analysis confirmed the significance of the feature dimensionality reduction techniques which led to better results in terms of several performance metrics as well as classification speeds. These findings highlight the potential usefulness of auto-encoder and PCA in dimensionality reduction for IDS. From our experiments, we found that PCA is superior, faster, more interpretable and can reduce the dimensionality of the data to as few as two components. The long training time and limited computational resources formed a barrier towards reducing the dimensionality beyond 59 features representation for the AE approach. This study suggests that AE can be used when the data necessitates a highly non-linear feature representation.
The large number of decision trees that the Random Forest classifier produced by randomly selecting a subset of training samples and a subset of variables for splitting at each tree node, makes the Random Forest classifier less sensitive to both the quality of training instances as well as the overfitting issue. Moreover, Random Forest is suitable, robust, and stable to classify high dimensional and correlated data. These explanations provide a justification as to why Random Forest yielded better results in comparison with other classifiers [55].
As exemplified by the obtained results, the PCA approach is able to preserve important information in CICIDS2017, while efficiently reducing the features dimensions in the used dataset, as well as presenting a reasonable visualization model of the data. Features such as Subflow Fwd Bytes, Flow Duration, Flow Inter arrival time (IAT), PSH Flag Count, SYN Flag Count, Average Packet Size, Total Len Fwd Pck, Active Mean and Min, ACK Flag Count, and Init_Win_bytes_fwd are observed to be the discriminating features embedded in CICIDS2017 [4]. Regarding this study, PCA was very efficient and produced better results than AE. In comparison with AE, the PCA approach is restricted to a linear mapping, whereas the AE can have a nonlinear encoder/decoder architecture.
As a future direction, this research will also serve as a base for further studies and investigations towards developing efficient IDS’s from various intrusion detection datasets. Furthermore, the trained models could be extended to implement an IDS for online anomaly-based detection.

Author Contributions

Supervision, M.F. and A.A. (Abdelshakour Abuzneid); Writing—original draft, R.A.; Writing—review & editing, M.F., A.A. (Abdelshakour Abuzneid) and R.A.; Data Preprocessing, R.A. and A.A. (Ali Alessa); Software, R.A. and H.M.; Methodology, R.A.; Project Administration A.A. (Abdelshakour Abuzneid) and M.F.

Funding

This research was funded by the University of Bridgeport Seed Money Grant UB-SMG-2018.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
IDSIntrusion Detection System
CICIDS2017Canadian Institute for Cybersecurity Intrusion Detection System 2017 dataset
AEAuto-Encoder
PCAPrinciple Component Analysis
LDALinear Discriminant Analysis
QDAQuadratic Discriminant analysis
UDBBUniform Distribution-Based Balancing
KNNK-Nearest Neighbors
RFRandom Forest
SVMSupport Vector Machine
XGBoosteXtreme Gradient Boosting
MLPMulti Layer Perceptron
FFCForward Feature Construction
BFEBackward Feature Elimination
AccAccuracy
FARFalse Alarm Rate
F-MF-Measure
MCCMatthews Correlation Coefficient
DRDetection Rate
RRReduction Rate
( A E R F ) M c Auto-Encoder-Random Forest-Multi-Class
( A E L D A ) M c Auto-Encoder-Linear Discriminant Analysis-Multi-Class
( A E Q D A ) M c Auto-Encoder-Quadratic Discriminant Analysis-Multi-Class
( A E B N ) M c Auto-Encoder-Bayesian Network-Multi-Class
( A E R F ) B c Auto-Encoder-Random Forest-Binary-Class
( A E L D A ) B c Auto-Encoder-Linear Discriminant Analysis-Binary-Class
( A E Q D A ) B c Auto-Encoder-Quadratic Discriminant Analysis-Binary-Class
( A E B N ) B c Auto-Encoder-Bayesian Network-Binary-Class
( P C A R F ) M c Principle Components Analysis-Random Forest-Multi-Class
( P C A L D A ) M c Principle Components Analysis-Linear Discriminant Analysis-Multi-Class
( P C A Q D A ) M c Principle Components Analysis-Quadratic Discriminant Analysis-Multi-Class
( P C A B N ) M c Principle Components Analysis-Bayesian Network-Multi-Class
( P C A R F ) B c Principle Components Analysis-Random Forest-Binary-Class
( P C A L D A ) B c Principle Components Analysis-Linear Discriminant Analysis-Binary-Class
( P C A Q D A ) B c Principle Components Analysis-Quadratic Discriminant Analysis-Binary-Class
( P C A B N ) B c Principle Components Analysis-Bayesian Network-Binary-Class
C M ( B c ) Combined Metrics for Binary Class
C M ( M c ) Combined Metrics for MultiClass

References

  1. Albanese, M.; Erbacher, R.F.; Jajodia, S.; Molinaro, C.; Persia, F.; Picariello, A.; Sperlì, G.; Subrahmanian, V. Recognizing unexplained behavior in network traffic. In Network Science and Cybersecurity; Springer: Berlin, Germany, 2014; pp. 39–62. [Google Scholar]
  2. Abdulhammed, R.; Faezipour, M.; Elleithy, K. Intrusion Detection in Self organizing Network: A Survey. In Intrusion Detection and Prevention for Mobile Ecosystems; Kambourakis, G., Shabtai, A., Kolias, C., Damopoulos, D., Eds.; CRC Press Taylor & Francis Group: New York, NY, USA, 2017; Chapter 13; pp. 393–449. [Google Scholar]
  3. Lee, C.H.; Su, Y.Y.; Lin, Y.C.; Lee, S.J. Machine learning based network intrusion detection. In Proceedings of the 2017 2nd IEEE International Conference on Computational Intelligence and Applications (ICCIA), Beijing, China, 8–11 September 2017; pp. 79–83. [Google Scholar]
  4. Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the Fourth International Conference on Information Systems Security and Privacy, ICISSP, Funchal, Madeira, Portugal, 22–24 January 2018. [Google Scholar]
  5. Sorzano, C.O.S.; Vargas, J.; Montano, A.P. A survey of dimensionality reduction techniques. arXiv, 2014; arXiv:1403.2877. [Google Scholar]
  6. Fodor, I.K. A Survey of Dimension Reduction Techniques; Center for Applied Scientific Computing, Lawrence Livermore National Laboratory: Livermore, CA, USA, 2002; Volume 9, pp. 1–18. [Google Scholar]
  7. Rosaria, S.; Adae, I.; Aaron, H.; Michael, B. Seven Techniques for Dimensionality Reduction; KNIME: Zurich Switzerland, 2014. [Google Scholar]
  8. Van Der Maaten, L.; Postma, E.; Van den Herik, J. Dimensionality reduction: A comparative review. J. Mach. Learn. Res. 2009, 10, 66–71. [Google Scholar]
  9. Bertens, P. Rank Ordered Autoencoders. arXiv, 2016; arXiv:1605.01749. [Google Scholar]
  10. Vijayan, R.; Devaraj, D.; Kannapiran, B. Intrusion detection system for wireless mesh network using multiple support vector machine classifiers with genetic-algorithm-based feature selection. Comput. Secur. 2018, 77, 304–314. [Google Scholar] [CrossRef]
  11. Radford, B.J.; Richardson, B.D. Sequence Aggregation Rules for Anomaly Detection in Computer Network Traffic. arXiv, 2018; arXiv:1805.03735. [Google Scholar]
  12. Lavrova, D.; Semyanov, P.; Shtyrkina, A.; Zegzhda, P. Wavelet-analysis of network traffic time-series for detection of attacks on digital production infrastructure. SHS Web Conf. EDP Sci. 2018, 44, 00052. [Google Scholar] [CrossRef]
  13. Watson, G. A Comparison of Header and Deep Packet Features When Detecting Network Intrusions; Technical Report; University of Maryland: College Park, MD, USA, 2018. [Google Scholar]
  14. Aksu, D.; Üstebay, S.; Aydin, M.A.; Atmaca, T. Intrusion Detection with Comparative Analysis of Supervised Learning Techniques and Fisher Score Feature Selection Algorithm. In International Symposium on Computer and Information Sciences; Springer: Berlin, Germany, 2018; pp. 141–149. [Google Scholar]
  15. Marir, N.; Wang, H.; Feng, G.; Li, B.; Jia, M. Distributed Abnormal Behavior Detection Approach based on Deep Belief Network and Ensemble SVM using Spark. IEEE Access 2018. [Google Scholar] [CrossRef]
  16. Spark, A. PySpark 2.4.0 Documentation. 2018. Available online: https://spark.apache.org/docs/latest/api/python/index.html (accessed on 10 November 2018).
  17. Bansal, A. DDR Scheme and LSTM RNN Algorithm for Building an Efficient IDS. Master’s Thesis, Thapar Institute of Engineering and Technology, Punjab, India, 2018. [Google Scholar]
  18. Chen, T.; He, T.; Benesty, M. Xgboost: Extreme Gradient Boosting. R Package Version 0.4-2. 2015, pp. 1–4. Available online: http://cran.fhcrc.org/web/packages/xgboost/vignettes/xgboost.pdf (accessed on 11 March 2019).
  19. Hothorn, T.; Hornik, K.; Zeileis, A. Ctree: Conditional Inference Trees. The Comprehensive R Archive Network. 2015. Available online: https://cran.r-project.org/web/packages/partykit/vignettes/ctree.pdf (accessed on 23 January 2019).
  20. Aminanto, M.E.; Choi, R.; Tanuwidjaja, H.C.; Yoo, P.D.; Kim, K. Deep abstraction and weighted feature selection for Wi-Fi impersonation detection. IEEE Trans. Inf. Forensics Secur. 2018, 13, 621–636. [Google Scholar] [CrossRef]
  21. Zhu, J.; Ming, Y.; Song, Y.; Wang, S. Mechanism of situation element acquisition based on deep auto-encoder network in wireless sensor networks. Int. J. Distrib. Sens. Netw. 2017, 13. [Google Scholar] [CrossRef]
  22. Al-Qatf, M.; Lasheng, Y.; Alhabib, M.; Al-Sabahi, K. Deep Learning Approach Combining Sparse Autoen-coder with SVM for Network Intrusion Detection. IEEE Access 2018, 6, 52843–52856. [Google Scholar] [CrossRef]
  23. Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. Nsl-Kdd Dataset. 2012. Available online: http://www.unb.ca/research/iscx/dataset/iscx-NSL-KDD-dataset.html (accessed on 28 February 2016).
  24. Bay, S.D.; Kibler, D.; Pazzani, M.J.; Smyth, P. The UCI KDD archive of large data sets for data mining research and experimentation. ACM SIGKDD Explor. Newsl. 2000, 2, 81–85. [Google Scholar] [CrossRef]
  25. Javaid, A.; Niyaz, Q.; Sun, W.; Alam, M. A deep learning approach for network intrusion detection system. In Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS), ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), Cotonou, Benin, 24 May 2016; pp. 21–26. [Google Scholar]
  26. Min, E.; Long, J.; Liu, Q.; Cui, J.; Cai, Z.; Ma, J. SU-IDS: A Semi-supervised and Unsupervised Framework for Network Intrusion Detection. In International Conference on Cloud Computing and Security; Springer: Cham, Switzerland, 2018; pp. 322–334. [Google Scholar]
  27. Xia, D.; Yang, S.; Li, C. Intrusion Detection System Based on Principal Component Analysis and Grey Neural Networks. In Proceedings of the 2010 Second International Conference on Networks Security, Wireless Communications and Trusted Computing, Wuhan, Hubei, China, 24–25 April 2010; Volume 2, pp. 142–145. [Google Scholar] [CrossRef]
  28. Vasan, K.K.; Surendiran, B. Dimensionality reduction using Principal Component Analysis for network intrusion detection. Perspect. Sci. 2016, 8, 510–512. [Google Scholar] [CrossRef]
  29. Shiravi, A.; Shiravi, H.; Tavallaee, M.; Ghorbani, A.A. Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 2012, 31, 357–374. [Google Scholar] [CrossRef]
  30. Aminanto, M.E.; Kim, K. Improving Detection of Wi-Fi Impersonation by Fully Unsupervised Deep Learning. In Proceedings of the Information Security Applications: 18th International Workshop (WISA 2017), Jeju Island, Korea, 24–26 August 2017. [Google Scholar]
  31. Aminanto, M.E.; Kim, K. Detecting Active Attacks in WiFi Network by Semi-supervised Deep Learning. In Proceedings of the Conference on Information Security and Cryptography 2017 Winter, Sochi, Russian Federation, 8–10 September 2017. [Google Scholar]
  32. Kolias, C.; Kambourakis, G.; Stavrou, A.; Gritzalis, S. Intrusion detection in 802.11 networks: Empirical evaluation of threats and a public dataset. IEEE Commun. Surv. Tutor. 2016, 18, 184–208. [Google Scholar] [CrossRef]
  33. Vilela, D.W.; Ed’Wilson, T.F.; Shinoda, A.A.; de Souza Araujo, N.V.; de Oliveira, R.; Nascimento, V.E. A dataset for evaluating intrusion detection systems in IEEE 802.11 wireless networks. In Proceedings of the 2014 IEEE Colombian Conference on Communications and Computing (COLCOM), Bogota, Colombia, 4–6 June 2014; pp. 1–5. [Google Scholar]
  34. Ring, M.; Wunderlich, S.; Grüdl, D.; Landes, D.; Hotho, A. Flow-based benchmark data sets for intrusion detection. In Proceedings of the 16th European Conference on Cyber Warfare and Security, Dublin, Ireland, 29–30 June 2017; pp. 361–369. [Google Scholar]
  35. Canadian Institute of Cybersecurity, University of New Brunswick. CICFlowMeter. 2017. Available online: https://www.unb.ca/cic/research/applications.html#CICFlowMeter (accessed on 23 January 2019).
  36. CIC. Canadian Institute of Cybersecurity. List of Extracted Traffic Features by CICFlowMeter-V3. 2017. Available online: https://www.unb.ca/cic/datasets/ids-2017.html (accessed on 23 January 2019).
  37. Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv, 2013; arXiv:1312.6114. [Google Scholar]
  38. Rezende, D.J.; Mohamed, S.; Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. arXiv, 2014; arXiv:1401.4082. [Google Scholar]
  39. Sakurada, M.; Yairi, T. Anomaly detection using autoencoders with nonlinear dimensionality reduction. In Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, Gold Coast, QLD, Australia, 2 December 2014; p. 4. [Google Scholar]
  40. Makhzani, A. Unsupervised Representation Learning with Autoencoders. Ph.D. Thesis, University of Toronto, Toronto, ON, Canada, 2018. [Google Scholar]
  41. Mirsky, Y.; Doitshman, T.; Elovici, Y.; Shabtai, A. Kitsune: An ensemble of autoencoders for online network intrusion detection. arXiv, 2018; arXiv:1802.09089. [Google Scholar]
  42. Johnstone, I.M.; Lu, A.Y. Sparse principal components analysis. arXiv, 2009; arXiv:0901.4392. [Google Scholar]
  43. Espíndola, R.; Ebecken, N. On extending f-measure and g-mean metrics to multi-class problems. WIT Trans. Inf. Commun. Technol. 2005, 35. [Google Scholar] [CrossRef]
  44. García, V.; Sánchez, J.S.; Mollineda, R.A. On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl.-Based Syst. 2012, 25, 13–21. [Google Scholar] [CrossRef]
  45. Hamed, T.; Dara, R.; Kremer, S.C. Network intrusion detection system based on recursive feature addition and bigram technique. Comput. Secur. 2018, 73, 137–155. [Google Scholar] [CrossRef]
  46. Bermejo, P.; Gámez, J.A.; Puerta, J.M. Improving the performance of Naive Bayes multinomial in e-mail foldering by introducing distribution-based balance of datasets. Expert Syst. Appl. 2011, 38, 2072–2080. [Google Scholar] [CrossRef]
  47. Abdulhammed, R.; Faezipour, M.; Abuzneid, A.; AbuMallouh, A. Deep and Machine Learning Approaches for Anomaly-Based Intrusion Detection of Imbalanced Network Traffic. IEEE Sens. Lett. 2019, 3, 7101404. [Google Scholar] [CrossRef]
  48. Louppe, G. Understanding Random Forests: From Theory to Practice. Ph.D. Thesis, University of Liège, Belgium, 2014. [Google Scholar]
  49. Aksu, D.; Aydin, M.A. Detecting Port Scan Attempts with Comparative Analysis of Deep Learning and Support Vector Machine Algorithms. In Proceedings of the 2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT), Ankara, Turkey, 3–4 December 2018; pp. 77–80. [Google Scholar]
  50. Ustebay, S.; Turgut, Z.; Aydin, M.A. Intrusion Detection System with Recursive Feature Elimination by Using Random Forest and Deep Learning Classifier. In Proceedings of the 2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT), Ankara, Turkey, 3–4 December 2018; pp. 71–76. [Google Scholar]
  51. Bansal, A.; Kaur, S. Extreme Gradient Boosting Based Tuning for Classification in Intrusion Detection Systems. In International Conference on Advances in Computing and Data Sciences; Springer: Singapore, 2018; pp. 372–380. [Google Scholar]
  52. Kaur, P.; Rattan, D.; Bhardwaj, A.K. An analysis of mechanisms for making ids fault tolerant. Int. J. Comput. Appl. 2010, 1, 22–25. [Google Scholar] [CrossRef]
  53. Viegas, E.; Santin, A.; Neves, N.; Bessani, A.; Abreu, V. A Resilient Stream Learning Intrusion Detection Mechanism for Real-time Analysis of Network Traffic. In Proceedings of the GLOBECOM 2017—2017 IEEE Global Communications Conference, Singapore, 4–8 December 2017; pp. 1–6. [Google Scholar]
  54. Al-Shehri, S.M.; Loskot, P.; Numanoglu, T.; Mert, M. Common Metrics for Analyzing, Developing and Managing Telecommunication Networks. arXiv, 2017; arXiv:1707.03290. [Google Scholar]
  55. Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.