A Hybrid Spectral Clustering and Deep Neural Network Ensemble Algorithm for Intrusion Detection in Sensor Networks

The development of intrusion detection systems (IDS) that are adapted to allow routers and network defence systems to detect malicious network traffic disguised as network protocols or normal access is a critical challenge. This paper proposes a novel approach called SCDNN, which combines spectral clustering (SC) and deep neural network (DNN) algorithms. First, the dataset is divided into k subsets based on sample similarity using cluster centres, as in SC. Next, the distance between data points in a testing set and the training set is measured based on similarity features and is fed into the deep neural network algorithm for intrusion detection. Six KDD-Cup99 and NSL-KDD datasets and a sensor network dataset were employed to test the performance of the model. These experimental results indicate that the SCDNN classifier not only performs better than backpropagation neural network (BPNN), support vector machine (SVM), random forest (RF) and Bayes tree models in detection accuracy and the types of abnormal attacks found. It also provides an effective tool of study and analysis of intrusion detection in large networks.


Introduction
Networking has become ubiquitous in people's lives and work; hence, network security bears increasing importance for network users and operators. Network intrusion detection is a network security mechanism designed to detect, prevent and repel unauthorized access to a communication or computer network. IDSs play a crucial role in maintaining safe and secure networks. In recent years, vast amounts of network data have been generated due to the application of new network technologies and equipment, which has led to declining detection rates. The intrusion detection process is complicated due to the dynamic nature of networks and the power available for processing high volumes of data from distrusted network systems, making it difficult to achieve high detection accuracy at rapid speeds [1].
Many researchers have proposed innovative approaches to intrusion detection in recent years. These methods, based on detecting behaviour and access resource type, are divided into four categories. The first uses statistical analysis to detect attack types based the relationships between nodes in a network, such as with Bayesian [2], decision tree and fuzzy logic models. These techniques build models of normal network data and calculate the probability that a given sample deviates from these models. If the probability that a sample value's deviation is greater than threshold, then the sample is classified as a type of attack [3]. The second category is an anomaly detection approach in which most methods require a set of standard normal datasets to train a classifier and determine whether new samples fit the model. Examples of such methods include SC, self-organizing map (SOM) and unsupervised support vector machine approaches [4]. The third category employs classification techniques to detect attack types by taking advantage of machine learning algorithms such as SVM [5], RFs [6], genetic algorithms (GA) and artificial neural networks (ANN), which can prioritize solutions to certain problems. The last category includes hybrid and ensemble detection methods that integrate the advantages of various methods in order to increase detection accuracy [7]. These approaches include bagging, AdaBoost [8], and the Particle Swarm Optimization (PSO)-K-means ensemble approach [9]. The PSO-k-means method can achieve an optimal number of clusters, as well as increasing detection rates while decreasing false positive rate, and can be successful in detecting network intrusion. In addition, the SVM-KNN-PSO ensemble method proposed by [10] can obtain the best results. However this work is based on binary classification methods, which can distinguish between only two states.
Recently, deep learning has become a popular topic of research, and methods based on deep learning have successfully been widely applied in image identification and speech recognition. In recent literature, deep learning has been used in network security detection. The self-taught learning (STL) model, based on deep learning techniques, was proposed for network intrusion detection. This model utilized a sparse auto-encoder (SAE) to learn dataset features; the NSL-KDD dataset was used to test its detection performance [11]. Deep belief networks (DBN) are used to initialise the parameters of DNN through a pre-training process. Initialising a DNN with probability-based feature vectors is used to detect vehicular networks and significantly improves detection rates [12]. Fiore et al. [13] proposed a novel model for network anomaly detection based on a semi-supervised discriminative restricted Boltzmann machine (DRBM), which decouples training and generalizes the neural networks. In a previous hybrid model, Salama et al. [14] proposed a combination of a DBN and SVM. The DBN is used to reduce the dimensionality of the input dataset and the SVM is used to classify attack types.
For complicated network attacks, some research has focused on rule-based expert detection, which utilizes pre-designed processing rules to detect attacks [15]; however, for very large datasets, rule-based expert approaches become less capable. Additionally, wireless sensor networks (WSN), a new post-Internet network architecture, are widely and increasingly used in a variety of smart environment applications. Adding these to conventional and mobile networks, and taking into consideration the diverse intrusion methods available to malicious agents, the technical requirements of intrusion detection are more complex than ever before [16]. Attacks on WSNs are easier to carry out than those on wired networks because the distribution of WSNs is inherently limited, and because of the multi-hop communication, bandwidth and battery power used therein. Therefore, designing an effective IDS for WSNs is very important. Due to this new environment, new attacks have been devised (e.g., black hole, sink hole, abnormal transmission and packet dropping attacks). The attacks are divided by mechanism into signature, anomaly and hybrid intrusion detection methods. The anomaly detection method, for instance, is widely used for security in WSNs [17].
However, intrusion detection has been insufficient in several ways. First, the learning capacity of traditional detection approaches that sum features of the raw data, map them into vectors, and then feed them to a classifier is limited. When network structure is complicated, learning efficiency further decreases. Second, this primary method only partially represents one or two levels of information; it is insufficient for identifying additional attack types. Third, in real network datasets, the types of network intrusion are similar to those in normal datasets, which confine classifiers from having enough information with which to categorize them. Next, the intrusion actions behave unpredictably, which causes IDS to make costly errors in detecting intrusions. Therefore, it is necessary to find an effective intrusion detection method [18]. Finally, the variety of network types has generated large-scale data with high-dimensional structures, for which traditional intrusion detection approaches are unsuitable [19].
Literature concerning fields in which many-layered learners represent network dataset features is quite limited. Lin et al. [20] proposed combining the k-means and k-Nearest Neighbour (KNN) algorithms to select dataset features and to categories network attack types. Samples are reconstructed into clusters and a two-dimensional vector dataset is generated with different features. Then, the training dataset is divided into subsets by the KNN algorithm. This approach more efficiently detects normal data and Denial of service (DoS) attacks with high accuracy, but it does not perform well in detecting user-to-root (U2R) and remote-to-local (R2L) attacks, which means that the selected features represent these attack types poorly and the proposed model can only learn limited patterns.
Taking the above into consideration, this paper proposes the SCDNN model, which uses spectral clustering to capture raw data features and divide the dataset into subsets. Each subset is then fed to a DNN. These DNNs can learn information about different characteristics of the subset. The test datasets have previously been divided by training dataset features into subsets. Finally, the test subsets are fed to the trained DNN for intrusion detection. Because the SC algorithm can extract similar features and acquire more information from a dataset, the DNN can obtain sufficient information via prior learning, which enables the DNN to familiarize itself with as many of the rules and features of network attack types as possible. Deep learning is believed to be able to extract features from massive and complex datasets; hence, it can solve non-linear problems with complex and large-scale data (e.g., it has been applied in weather forecasting) [21,22]. Experimental results based on the KDD-CUP99 dataset and NLS-KDD datasets show that an SCDNN generates better accuracy and more robust results than other well-known algorithms, and excels in parallel computing [23][24][25][26].
The contribution of this study is as follows: First, we adopt SC to capture the features of complex network datasets, for attack types similar to normal access, using clustering to find features that divide the dataset into subsets with different cluster centres. Second, taking advantage of multi-layer DNNs, attack types can be learned at high, abstract levels of network data and complex network intrusion rules can be found. The intrusion process is becoming more variable and easily disguised as normal network activity. This situation is particularly common in large datasets and complicates the relationship between distributed processing protocols. Knowledge of categories in a dataset is sufficient for the SCDNN model to be able to learn. Finally, our experimental results indicate that the accuracy, detection and false alarm rates of SCDNN are better than those of conventional methods.
The rest of the paper is organized as follows. The related literature concerning IDS is reviewed in Section 2. Section 3 presents a detailed definition of the proposed approach and describes how the algorithm works. Section 4 describes the experimental dataset and illustrates the data preparation, evaluation criteria, results and discussions of experiments. Finally, the conclusions and suggestions for future work are provided in Section 5.

Related Works
In this section, the spectral clustering and DNN algorithms are briefly introduced. To overcome the limitations of traditional auto-encoders, this study adapts a SAE and denoising auto-encoders (DAEs) for the DNN. IDSs are a classification problem in which dataset features are very important, because the learner acquires knowledge and patterns based on the characteristics of the data. Additionally, the level of feature representation determines a learner's performance.

Spectral Clustering Algorithm
Spectral clustering uses the top eigenvectors of a matrix derived from the input data and transforms the clustering problem into a graph cut problem. The graph cutting approach clusters data points by attribute such that densely packed points are in same cluster, whereas the sparse are in different clusters. The minimum cut formula is written as: cut(x, y) = ∑ i∈A,j∈B W ij (1)

DNN Algorithm
The essence of the deep neural network is to learn useful features and construct multi-layer networks from vast amounts of training data. Forecast accuracy is improved using DNNs, allowing more information regarding the raw dataset to be obtained.
DNN has deep architectures-including multiple hidden layers-and each hidden layer alone conducts a non-linear transformation from the previous layer to the next [29,30]. Through deep learning, proposed by Hinton et al. [31], a DNN is trained according to two main pre-trained, fine-tuned procedures [32,33].

Auto-Encoders
An auto-encoder is a type of unsupervised three-layer neural network whose output target is input data [34], which is described in Figure 1. The auto-encoder includes both encoder and decoder networks. The encoder network transforms input data from a high-dimensional space to code in a low-dimensional space, and the decoder network remodels this input from the previous step. The encoder network is defined as an encoding function denoted by f encoder . This function details the encoding process: where x m is a data point and h m is the encoding vector obtained from x m .

Decoder
The decoder network is defined as a reconstruction function denoted as f decoder and is described as follows: where x m is the decoding vector obtained from h m . There are specific algorithms for several encoding and reconstruction functions, including:

Sparse Auto-Encoder (SAE)
The default auto-encoder attempts to learn feature approximation, and the function of the auto-encoder of x is close to the raw input data of x. It is seems that the identity function is able to learn most features of the input data. However, the number of units in each layer of the neural network is constrained, with x as the minimum original size. When the number of hidden units increases, runtime efficiency decreases for the analysis of input data features, although the auto-encoder will still learn the features of the raw data. From the above analysis, a sparsity coder can constrain the input data with useful identification information [35]. Let a (2) j (x) represent the activation of the jth hidden units by the auto-encoder from the input x, so that the average activation of the jth unit in the hidden layer is defined as follows [36]:ρ [a (2) j (x (i) )] Averaged over the input set, letρ j represent the degree of sparsity, where ρ is a sparsity parameter. Letρ j = ρ, which is a small value close to 0; typically, the value of ρ is set to 0.05. To satisfy the condition that the activation is near 0, a penalty term is added to optimise the deviation ofρ j from ρ. The optimisation formula is defined as follows: in which the number of hidden-layer units is s 2 and j denotes hidden units in the network. The Kullback-Leibler (KL) divergence is written as follows [37]: where the KL divergence between a Bernoulli random variable with a mean of ρ andρ j is defined as below: Ifρ j equals ρ, then the value of Equation (9) is equal to 0, which means that the KL divergence reaches the minimum value and close to 0, so the overall cost function is defined as follows: where J(w, b) is the squared error cost function and defined as: The first term of the function J(W, b) is an average sum of squared error over m samples and outputs. The second term is a weight decay term which can decrease the magnitude of each weight in a neural network and can effectively prevent overfitting. From Equation (11), β is a coefficient to adjust the weight of the sparsity penalty term. The termρ j depends on the neural network parameters W and b, and it is optimised using a derivative checking method [38]. The stochastic gradient descent (SGD) approach is adopted to optimise W and b, and can be written as follows: in Equations (13) and (14), ε is the learning rate; squared errors are obtained by training over examples and calculating the averageρ j . The best values of W and b are obtained by back propagation with SGD.

Denoising Auto-Encoders (DAEs)
There is a way to capture something useful about the hidden units with input data. DAEs are used to discover more robust hidden-layer features and can effectively prevent the networks from simply learning features. The DAE is a stochastic auto-encoder that tries to represent information about the input data that can only be learned by capturing the statistical attributes between the encoded and input data. The DAEs can be represented in a variety of ways, including from stochastic operator, bottom-up-information and top-down-generative model perspectives. In this paper, the DAEs learned feature representations by adjusting partial corruption from the input pattern [39]. A fixed number of components in each input x(i) are chosen randomly and their value set to zero; this distribution is called qD(·), then the joint distribution of input and output is defined as follows: in which the encoding of X is a stochastic mapping by the distribution χ ∼ qD(χ|X). If the condition f θ (x) = y · q 0 (X) is satisfied, then δ f θ (χ) (y) indicates the empirical distribution associated with the input data and set to 0. χ can be determined by y. In order to obtain the best reconstructed version of X with consideration between χ and y, let θ be a parameter of the joint distribution of Equation (15), and calculate the minimum error of the cost function of the input matrix X from y as follows: where [L(X, X )] is the error function depending on input x and the reconstructed X . The best cost function value is then obtained using SGD. To encode optimised denoising codes, there is a trick in encoder process. Let y be a hidden representation obtained with the function sigmoid(Wχ + b), where χ is a generated corrupted version of the input data and the parameters of the function are denoted as θ = {W, b}. Each corrupted code contains some elements of input X randomly set to 0. Then, the final corrupted version can be minimised in Equation (16).

Pre-Training
N auto-encoders can be stacked to pre-train an N-hidden-layer DNN. When given an input dataset, the input layer and the first hidden layer of the DNN are treated as the encoder network of the first auto-encoder. Next, the first auto-encoder is trained by minimising its reconstruction error. The trained parameter set of the encoder network is used to initialise the first hidden layer of the DNN. Then, the first and second hidden layers of the DNN are regarded as the encoder network of the second auto-encoder. Accordingly, the second hidden layer of the DNN is initialised by the second trained auto-encoder. This process proceeds until the Nth auto-encoder is trained to initialise the final hidden layer of the DNN. Thus, all hidden layers of the DNN are stacked in an auto-encoder in each training N times, and are regarded as pre-trained. This pre-training process is proven to be significantly better than random initialisation of the DNN and conducive to achieving generalisation in classification [40,41].

Fine-Tuning
Fine-tuning is a supervised process that improves the performance of a DNN. The network is retrained, training data are labelled, and errors calculated by the difference between real and predicted values are back-propagated using SGD for all multi-layer networks. SGD randomly selects dataset samples and iteratively updates the gradient direction with weight parameters. The best gradient direction is obtained with a minimum loss function. The advantage of SGD is that it converges faster than the traditional gradient descent methods do, and does not consider the entire dataset, making it suitable for complex neural networks. The SGD equation is defined below: where E is the loss function, y is the real label and t is the network output. The gradient of weight parameter w is obtained by the derivative of the error equation.
With the gradient of w ij , the updated SGD equation is defined as: where η is the step size and is greater than 0, and h is the number of hidden layers in the DNN [42]. This process is tuned and optimised by the weight and threshold based on the correctly labelled data in the DNNs. In this way, the DNN can learn important knowledge for its final output and direct the parameters of the entire network to perform correct classifications [43].

The Proposed Approach of SCDNN
Several classification engines (BPNN, SVM, RF, Bayes and DNN) perform well given the advantages of their algorithms, which can efficiently handle complex classification problems; hence, these models can be successfully applied to intrusion detection. They usually perform poorly when facing the complex randomness and camouflage of network intrusion dataflow. Therefore, in this section, the proposed approach is employed to solve the above problems. First, the training data subsets divide the training process and calculate centre points by SC from each training point. Second, each training data subset is trained by the corresponding DNNs, where the scale of DNNs is the same as the number of clusters. In this way, the DNNs have learned different characteristics from each subset. Third, the testing data subsets are divided into test datasets by SC, which uses the previous cluster centres in its first step, and these subsets are applied to detect intrusion attack types by pre-trained DNNs. Finally, the output of every DNN is aggregated for the final result of the intrusion detection classifiers.

SCDNN
The SCDNN model includes three steps, and the general task in each step is described as follows and shown in Figure 2.
These testing sub datasets are applied to detect intrusion attack types. Step 1: The dataset is divided into two components: training and testing datasets. The training dataset is clustered by SC and this output is regarded as the training subset labelled Training Datasets 1-k. The SC centres from the training dataset clustering process are stored to serve as initialisation cluster centre for generating the testing dataset clusters. Because intrusion data features indicate similar attributes of each type in the raw dataset, points in the training dataset with similar features are aligned into groups and regarded as the same subset. In order to improve SCDNN performance, different cluster numbers and values of sigma are considered. The number of clusters ranged from 2 to 6 and sigma from 0.1 to 1.0. The samples are assigned to one cluster by similarity. The minimum distance from a data point to each cluster centre is measured by Algorithm 1 and each point is assigned to a cluster. The training subsets generated by clusters are given as input to the DNNs. In order to train the different subsets, the number of DNNs is equal to the number of data subsets. The architecture of each DNN consists of five layers: two hidden layers, one input layer, one softmax layer and one output layer. In this paper, the two hidden layers learn features from each training subset and the top layer is a five-dimensional output vector. Each training subset which are generated from the kth cluster centre by SC algorithm in clustering process, regarded as input data to feed into the kth DNN, respectively. These trained sub-DNN models are marked as sub-DNN 1 through sub-DNN k.
Step 2: The testing dataset, which has been divided from the raw dataset, is used to generate k datasets. The previous cluster centres obtained from SC cluster in Step 1, are regarded as the initialization cluster centres of the SC algorithm in this step. The test dataset which are divided by SC cluster, are regarded testing subsets. These subsets are denoted as Test 1 through Test k.
Step 3: The k testing data subsets are fed into k sub-DNNs, which were completed by the k training data subsets in Step 1. The output of each sub-DNN is integrated as the final output and employed to analyse positive detection rates. Then, a confusion matrix is used to analyse the detection performance of the five classifications: Normal, DoS, Probe, U2R and R2L.

The SCDNN Algorithm
The SCDNN algorithm adopts a deep learning approach to categorise network attack types and fine-tune its weights and thresholds using backpropagation. The input vectors map low-dimensional space with DAEs and SAE to discover patterns in the data. The SCDNN algorithm is detailed in Algorithm 2.

Algorithm 2: SCDNN
Input: dataset, cluster number, number of hidden-layer nodes HLN, number of hidden layers HL. Output: Final prediction results /*Note the symbols of "/ * " and " * /" represent comments in this algorithm.*/ 1 Divide the raw dataset into two components: a training dataset and a testing dataset. /*get the largest matrix eigenvectors and training data subsets*/ 2 Obtain the cluster centres and SC results using Algorithm 1. Here, the clustering results are regarded as training data subsets. /*Train each DNN with each training data subset*/ 3 The learning rate, denoising and sparsity parameters are set and the weight and bias are randomly initialised. 4 The HL are set two hidden layers, HLN is set 40 nodes of the first hidden layer and 20 nodes of second hidden layer. 5 Compute the sparsity cost function J sparse (W, b) = J(W, b) + β ∑ s 2 j=1 KL(ρ ρ j ). 6 Parameter weights and bias are updated as Train k sub-DNNs corresponding to the training data subsets. 8 Fine-tune the sub-DNNs by using backpropagation to train them. 9 The final structure of the trained sub-DNNs is obtained and they are labelled with each training data subset. 10 Divide the testing dataset into subsets with SC. Cluster centre parameters from the training data clusters are used. 11 The testing data subsets are used to test corresponding sub-DNNs, based on each corresponding cluster centre between the testing and training data subsets. /*aggregate each prediction result*/ The eigenvectors of the point matrix and training dataset are generated by the SC output in Lines 1-2, the k DNNs are trained in Lines 3-9, the testing data subsets are obtained by calculating distance with the NJW algorithm in Line 10, the testing data subsets are used to test the DNNs, and the final results are predicted by the aggregation in Lines 11-13.

Experimental Results
These experiments examine and compare SCDNNs with other detection models: SVMs, BPNNs, RFs and Bayes. Six datasets from the KDD-CUP99 and NSL-KDD are used to evaluate the performance of all models. Then, the parameters, number of cluster and DNN weights are discussed and analysed.

Evaluation Methods
In this study, accuracy, recall, error rate (ER) and specificity are used to evaluate the performance of the detection models. The formulas of the above criteria are calculated as follows [44]: A true positive (TP) is a case that correctly distinguishes network attack type, a true negative (TN) shows normal network data classified correctly as normal, a false negative (FN) denotes a case in which an attack was classified as normal dataflow, and a false positive (FP) means that the a normal case was classified as an attack. The accuracy rate shows the overall correct detection accuracy of the dataset, ER refers to the robustness of the classifier, recall indicates the degree of correctly detected attack types of all cases classified as attacks, and specificity shows the percentage of correctly classified normal data. In the above, higher accuracy and recall and lower ER indicate good performance.

The Dataset
To evaluate the performance of our proposed approach, a series of experiments on the KDD-CUP-99 [31] and NLS-KDD datasets [22] were conducted. The 10% knowledge KDD-CUP-99 datasets are used in these experiments, as it is most comprehensive dataset that is widely used as a benchmark for the performance intrusion detection network models. Each record contains 41-dimensional feature data that of various continuous, discrete and symbolic type. This is unsuitable for network data records because it includes four attack categories (DoS, probe, R2L and U2R) [45].
In order to solve the above problem, pre-trained data are processed in three steps. First, the values of three symbolic features are mapped as a series of numeric values ranging from 1 to N, where N is the total number of symbols for each feature. Second, the KDD'99 dataset includes five classes, normal, DoS, probe, U2R, and R2L, which were mapped onto the numeric values 1-5, respectively [46]. Finally, because of the range of values of field columns of src_bytes and dst_bytes are quite different, in order to find more patterns and more easily handle the proposed method, the range of values of these two fields are mapped onto 0-5000. After data pre-processing, the total number of dataset features was transformed to 65.
The NLS-KDD dataset includes the KDDTrain+, KDDTest+ and KDDTest-21 subsets, which are used for this experiment because the KDD-CUP-99 datasets contain more frequent and harmful attack records. The NLS-KDD dataset has removed surplus records and is more suitable to evaluating the real-world performance of an intrusion detection algorithm. In order to fairly compare the capability of each algorithm, six datasets were randomly generated from the two KDD-CUP-99 and NSL-KDD datasets, which reduces the amount of data [47]. These are labelled Datasets 1 through 6. Probe, U2R and R2L attack classes occur with low frequency in the raw dataset, so these three types were manually distributed, whereas normal and DoS records were randomly selected based on three scales with 1%, 5% and 10% of total in the KDD-CUP-99 dataset, labelled Dataset 1-3, respectively [48].
For the NLS-KDD dataset, the three data subsets are generated from two training and two testing sets and designated Datasets 4-6. The training sets randomly selected 20% and 50% of the raw dataset, and the testing sets are KDDTest+ and KDDTest-21, respectively. Table 1 describes the six new datasets in detail. The training data of the six datasets include 24 attack types, and the testing data contain 38 attack types. Therefore, the test dataset includes specific attack types which are not presented in the training dataset, which is closer to the actual condition of network intrusion. In this way, we can evaluate the ability of each model to find new attack types. The six new datasets are used to evaluate the performance of the SCDNN and to compare it to the other detection methods previously mentioned. In addition, these experiments evaluate model performance for one of the novel network architectures arising in today's networking environment, such as ad hoc wireless networks, WSNs and wireless mesh networks. These networks have developed rapidly and are easier to attack than wired networks. In order to determine the detection performance of the proposed model in WSNs, the following experiments were carried out [49].

SCDNN Experiment I
In this section, the number of clusters k in SCDNN is evaluated on the six datasets, because the values of k and σ (a parameter of SC), are different in each dataset. k and σ have a serious impact on the precision of SCDNN results. Next, the testing datasets are used to compare the performance of the remaining five models.

Efficiency of Varied Cluster Numbers and Values of σ
The SCDNN algorithm is composed of three steps. First is clustering the training dataset; the numbers of clusters k ranges from 2 to 6, and the value range of σ in the SC algorithm is 0.1 to 1.0. The training data subsets are divided based on the values of k and σ. Different parameters impact the experiment results. In order to obtain the best efficiency, the average detection accuracy rate of the SCDNN is determined for the six datasets. These results are shown in Figure 3. Figure 3a-f shows testing results from the six datasets obtained from different values of k and σ. For almost all datasets, SCDNN performs best when σ is between 0.4 and 0.5; likewise, the optimal value of k is between 2 and 5. In order to clearly explain the testing selection process, the best parameters for the five attack types in each dataset are shown in Figure 4.
SCDNN prediction accuracy of the five network intrusion types with optimum σ values are highly volatile with increasing values of k in all datasets. A high k value can efficiently improve SCDNN accuracy; more clusters can be divided into smaller data subsets, which damages the integrity of the input data features and leads to poor SCDNN performance. U2R and R2L attacks comprise a small proportion of the dataset, which prevents the classifier from learning enough information and causes unbalanced dataset categories. Cluster number can determine how well the particular features of underrepresented attack types are mined; therefore, it is necessary to strikes a balance between the number of cluster and efficiency. The optimal values of k for each dataset (that is, those with the best SCDNN accuracy) are presented in Table 2.    Table 2 show that accuracy rates are volatile when k is less than 4, while great stability can be obtained when k is greater than 5. The best results occur when k is set to 2 in Dataset 1, with an overall accuracy of 91.97% and high DoS, probe and R2L detection rates. This implies that the clustering process can be classified into normal and attack types, which helps the SCDNN maintain efficient network intrusion categories. In the same way, the best results occur when k is 4, 5, 3 and 3 in Datasets 2 to 5, respectively. The total accuracy is 44.55% in the Dataset 6 when k is set to 5.

Figures 3 and 4 and
Accuracy is low across attack types because attack features are similar to those of normal data and are hard to detect. U2R and R2L attack detection results in low accuracy in all datasets except Dataset 1, probably because there are few such records in the dataset and their features are similar to those of DoS and probe attacks, which provides the DNN with less information and causes increased detection rate errors.

Results and Comparisons
In this section, the fusion matrix and evaluation criteria are calculated for the SCDNN and the four traditional detection methods for each of the six datasets. The optimal cluster number k and σ value are fed into each DNN described in Section 3. The DNNs have 65 input dimensions and five output dimensions, because the original dataset is a 65-dimensional vector and the goal is to distinguish five intrusion attack types. There are two hidden layers, the first has 40 neurons and the second has 20. The softmax layer has five dimensions. 10% validation is used to check for overfitting during the training process. The results of the above experiment are shown in Table 3 and Figure 5. The records are unbalanced in each of the six datasets; Normal and DoS cases are well-represented, whereas U2R and R2L occur rarely, because the latter two cases require complex intrusion actions generally carried out by an advanced user, resulting in more covert intrusion that is difficult to detect.
From Table 3 and Figure 5, considering overall accuracy, the SCDNN performs better than the other four methods and has the lowest error rates. Moreover, the proposed method shows especially good performance for the sparse U2R and R2L attack types in all datasets, and produces a higher accuracy rate than the other methods. The best accuracy rate for Dataset 1 is 98.21%, obtained by SVM. The SCDNN achieves 97.21% accuracy for normal data, proving that the SVM has better learned the features of the data than other models. All methods are effective for intrusion detection with this dataset, except for the RF method, which has low precision and 0% detection accuracy for U2R and R2L attack types. For Dataset 2, the SCDNN attains higher precision for normal, DoS and probe traffic, with rates of 98.42%, 97.20% and 70.64%, respectively, and lower precision for U2R and R2L traffic than the BPNN. In this experiment the BPNN results in higher recall than other methods for the sparse attack types, which illustrates that the BPNN, with simple layer, has obtained the expected accuracy in a specific dataset, though its performance is unstable. The RF algorithm performs well for the more common traffic, but obtained 0% accuracy in detecting cases of U2R and R2L attacks. It may be that the construction of the trees used by this method can effectively classify most data, but some sparse attack types belong to false classes. In Dataset 3, the SCDNN performs best on DoS and probe intrusions, with precision rates of 97.23% and 64.96%, respectively. This result indicates that the proposed method is robust and captures most patterns in the different datasets; in addition, the Bayes model has the highest rates, 7.02% and 7.46%, respectively, for the U2R data among all methods. The traditional machine learning models, RF and SVM, produce good results for normal and DoS data. BPNNs, RFs and the SCDNN show good and effective detection rates for normal, DoS and probe data in Datasets 4 and 5; the RF has the highest accuracies, 99.63% and 99.69%, of detecting normal traffic in the two datasets, though the SCDNN produced higher accuracy for DoS data and is effective in detecting sparse attack types. The RF, SVM and Bayes methods performed poorly in classifying rare attack types, with accuracy rates close to 0%.
Overall precision of all methods is quite low for Dataset 6, although the SCDNN performs better than other methods in overall accuracy and recall. Specifically, the RF has the highest accuracy rate, 99.72%, for normal data; the precision of the BPNN for probe traffic is 88.67%, significantly better than the second highest rate, 52.66%, produced by the SCDNN. One potential explanation for this difference is that the BPNN learns enough patterns from the dataset to detect this kind of attack type. In summary, the SCDNN is good at detecting probe intrusions and the sparse attack types, U2R and R2L. It obtains high overall accuracy and recall; however, the SVM, BP, RF and Bayes methods can serve for detecting normal traffic and DoS data. Especially, the RF is excellent in capturing the normal class, producing the highest accuracy rates in Datasets 3-6. These results show that DNNs are robust and capable of detecting sparse attack classes and can locate new attack types, which further proves that SCDNN are more suitable for intrusion detection.
The architecture of the proposed model is a hybrid of spectral clustering technology and deep neural networks. SC clusters samples by their features, an unsupervised process that can classify unknown attack type. Therefore, clustering can find patterns hidden in the samples based their attributes. In Experiment 1, SC is used to divide the dataset; initial groups are obtained using sample attributes. In this way, each sub-DNN is trained with special datasets obtained in the clustering process, and have found more information than they would otherwise, because the DNN model has more layers than the other, shallower models and used SAE to resample the raw dataset. The question of the number of hidden layer to use in the DNN is more complicated, as the DAEs increases computational overhead and finds more robust features in hidden layers. As shown in Table 1, the five attacks types are imbalanced. The last three attack types are the rarest. Their features are obtained by the data subsets generated by the clustering stage. This can help the SCDNN learn more features of sparse attack types. According to Table 3, the SCDNN performs well in detecting sparse attacks types: probe, U2R and R2L.

SCDNN Experiment II with a WSN Dataset
In order to test the performance of SCDNN for maintaining the security of WSNs, in this experiment the all-in-one (ns-2) network simulator platform is used to generate a test dataset. Because WSNs are widely used in industry and their multi-hop, distributed nature differs from that of exit networks, external intrusion in WSNs has become a popular topic of research in academia. Various kinds of security attack cases increase with the number of sensor nodes in a WSN. Therefore, it is important to adopt the proposed method to detect intrusion attacks in WSNs.
In this experiment, there are 10 to 50 sensor nodes to simulate attacker datasets; the ad hoc on-demand distance vector (AODV) protocol is assigned in each nodes. The sensor nodes transport protocol with a constrained bit rate and generated at a random time. The simulated platform generates a dataset which include protocol messages and data messages. For convenience, the dataset is adapted for the proposed algorithm; synthetic data are labelled with security attack types based on known major attack types on the AODV protocol and are shown in Table 4 [50,51]. Table 4. Basic attacker types with packet routing protocols on the ad hoc on-demand distance vector (AODV) protocol in wireless sensor networks (WSNs).

Attack Name Attack Description Attack Type
Active Reply The route reply is forged with abnormal support to reply. 1 Route drop The routing packets are dropped with some specific address.

Modify Sequence
The number of target sequences increases with largest maximal values. 3 Rushing Rushing of routing messages.

Data Interruption
A data packet is used to drop the route. 5 Route Modification The route is modified in Routing Table Entries. 6 Change Hop The route cost in routing tables entries is altered. 7 There are seven attack types that can be categorized based on consideration of their protocol messages in term of message type, transfer time and the number of hops. In this experiment, the detection methods are run 500 times each and the average detection accuracy if given as the final result. The output results are shown in Table 5. The detection accuracy at each sensor node scale is shown in Figure 6 and Table 5. The SCDNN can detect all seven attack types with high detection rates. For the 5% attack scale, the proposed model can find almost all attackers. Detection rates decrease with increasing the size of the attacker to 10% in the network. Moreover, detection accuracy is impacted for different values of σ. The reason for this change may be that when the attack process attempts to modify router information, the resulting increase of transport packets in WSNs makes detection performance more unstable. . Detection accuracy with five sensor node scales with 95% confidence interval. Accuracy is shown for (a) a 5% attacker and (b) a 10% attacker, 500 times each.

Discussion
In order to evaluate the performance of the SCDNN and compare it with the experimental results of the other four models in this section, receiver operating curves (ROC) are obtained for all models [52]. The ROC function is widely used to indicate an algorithm's discriminative capability in this field. The overall category performance of each model, the true positive rate (TPR) from Equation (11) and the false positive rate (FPR) (calculated as 1-Specificity from Equation (13)) are used to generate the ROC in Figure 7. In this section, there are five network data types to be detected, an example of multiclass classification. Handling five classes is complex, with the possibility of n positive results and n 2 + n errors to consider. In this situation, multiclass ROC and area under curve (AUC) methods are used to plot ROC curves and calculate each AUC [53]. The compared AUC value results for these ROCs in the six dataset are shown in Table 6.  Figure 7 show that the SCDNN has the largest AUC of the five models for the six datasets. This indicates that the SCDNN performed better than other models and can obtain higher detection rates in networks. Overall accuracy is used to generate the histograms and compare the results from the six datasets, as shown in Figure 8. This is a more detailed evaluation of classification performance for all five types (one normal and four attacks). In real intrusion data, there is an unbalanced distribution among the four attack cases, and the scale of the data is larger [54]. Models in literature are difficult to categorise, so it is important for intrusion detection models to detect sparse attack cases. The results shown in Figures 7 and 8 indicate that the SCDNN produces higher accuracy than other methods in the six datasets; thus, the proposed algorithm performs well on datasets with a range of distributions. In particular, the Bayes algorithm has best recall of 92.57% and the SCDNN method obtains best accuracy of 91.97% in Dataset 1. In Dataset 2, the BPNN has a best recall, of 92.88%, higher than that of the SCDNN method. The reason for this disparity may be that the percentage of attack cases is larger in the training dataset, which leads to the Bayesian and BPNN approaches developing larger probabilities for these active cases. From the above experiment, we see that the SCDNN algorithm is not only good at detecting normal dataflow, as well as DoS and Probe attacks, but also obtained higher accuracy for sparse attack types, U2R and R2L, in the six datasets. The SCDNN model is a suitable approach for intrusion detection in complex networks.
From the above discussion, the problem of selection number clusters is a general problem for all clustering methods [7]. In this paper, the proposed method combined the SC clustering technique and DNN ensemble algorithm; the parameters of the number k of clusters and σ were obtained from experiment results of six datasets. For other domain datasets, according the distribution of datasets, the range of k is set 2 through n and the range of σ is set 0.4 through 0.6 can be detected the effectiveness of the proposed method, in which n is number of categories [28,55]. Otherwise, for a large data set, the time complexity should be a major consideration because it spends more time for processing the training and detecting intrusion network. The novel optimized algorithm is considered for parallel computing and cloud computing.

Conclusions
In security technology, low-frequency attack events are challenging to predict and can severely threaten networks. As such, this paper puts forward a novel approach that takes advantage of spectral clustering and deep neural networks to detect attack types. In the first stage, network features are captured by clusters and divided into k subsets in a bid to discover more knowledge and patterns from similar clusters. In the second stage, highly abstract features are obtained by deep learning networks from the subsets produced during the clustering process.
Finally, testing subsets are used to detect attacks. This is an efficient way to improve detection rate accuracy. Experimental results show that the SCDNN performs better than SVM, BPNN, RF and Bayesian methods, with the best accuracy rates over the six datasets derived from the KDDCUP99 and NSL-KDD. Additionally, the proposed algorithm is more capable of classifying sparse attack cases and effectively improves detection accuracy in real security systems. However, the limitation of the SCDNN is that its weight parameters and the thresholds of each DNN layer need to be optimised, and the k and σ parameters of the clusters are determined empirically, not through mathematical theory, which deserves further study.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: