Unknown Security Attack Detection of Industrial Control System by Deep Learning

Wang, Jie; Li, Pengfei; Kong, Weiqiang; An, Ran

doi:10.3390/math10162872

Open AccessFeature PaperArticle

Unknown Security Attack Detection of Industrial Control System by Deep Learning

by

Jie Wang

,

Pengfei Li

^*,

Weiqiang Kong

and

Ran An

School of Software Technology, Dalian University of Technology, Dalian 116620, China

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(16), 2872; https://doi.org/10.3390/math10162872

Submission received: 14 June 2022 / Revised: 26 July 2022 / Accepted: 8 August 2022 / Published: 11 August 2022

(This article belongs to the Special Issue AI Algorithm Design and Application)

Download

Browse Figures

Versions Notes

Abstract

:

With the rapid development of network technologies, the network security of industrial control systems has aroused widespread concern. As a defense mechanism, an ideal intrusion detection system (IDS) can effectively detect abnormal behaviors in a system without affecting the performance of the industrial control system (ICS). Many deep learning methods are used to build an IDS, which rely on massive numbers of variously labeled samples for model training. However, network traffic is imbalanced, and it is difficult for researchers to obtain sufficient attack samples. In addition, the attack variants are rich, and constructing all possible attack types in advance is impossible. In order to overcome these challenges and improve the performance of an IDS, this paper presents a novel intrusion detection approach which integrates a one-dimensional convolutional autoencoder (1DCAE) and support vector data description (SVDD) for the first time. For the two-stage training process, 1DCAE fails to retain the key features of intrusion detection and SVDD has to add restrictions, so a joint optimization solution is introduced. A three-stage optimization process is proposed to obtain better performance. Experiments on the benchmark intrusion detection dataset NSL-KDD show that the proposed method can effectively detect various unknown attacks, learning with only normal traffic. Compared with the recent state-of-art intrusion detection baselines, the proposed method is improved in most metrics.

Keywords:

network security; network intrusion detection; deep learning; auto-encoder; SVDD

MSC:

68T99

1. Introduction

The early industrial control system (ICS) is a relatively independent and physically isolated system which is not connected with the external Internet environment. The stability and functionality of the industrial control system are the main issues to be concerned about. However, with the rapid development of smart technology and network technology, the isolation environment of the ICS has gradually changed to a network and open system. Due to the change in the network environment, the inherent security vulnerabilities of the ICS are constantly utilized, and security threats are growing. Successful intrusions into the ICS will cause system downtime and data leakage, which can affect the overall stability of the system, and even be a significant threat to public security.

The problem of ICS network security has aroused extensive research in industry and academia. First, the protection technology in the field of traditional information security was used by researchers as the security mechanism of ICS and improved with the traffic characteristics of ICS, such as industrial firewalls. However, attacks that can bypass the firewall represent a serious threat. For example, viruses can access the supervisory control and data acquisition (SCADA) system through no network devices (such as USB flash disk) and directly invade the central controller. Moreover, such protection mechanisms cannot handle internal attacks.

As a security protection technology, intrusion detection can effectively overcome the shortcomings of traditional protection technologies, e.g., industrial firewalls, and become the second line of defense for ICS. On the premise of not affecting the normal operation of the ICS, it can determine whether intrusion or abnormal operation has occurred in the system by integrating and analyzing the host or traffic information in the ICS system and detecting internal and external attacks to ensure the safety and stability of the system. Intrusion detection can be categorized into signature-based and anomaly-based methods [1]. The signature-based method constructs a signature library by learning the attack rules that have been confirmed. Then, the signature-based method matches the sample to be tested to the signature library. However, the signature-based method requires the intervention of human experts and cannot detect unknown attacks. Anomaly-based detection methods have drawn increasing attention, which can detect potential abnormal behavior by learning the system characteristics of normal behavior methods.

In recent years, artificial intelligence methods have been employed for intrusion detection [2] to automatically identify anomalous behavior in ICS networks. However, the existing machine-learning-based methods achieve low efficiency in processing large-scale, high-dimensional data, producing false positives and low detection accuracy. Deep learning based on artificial neural networks utilizes multiple hidden layers to perform a variety of non-linear data transforms and extracts specific features with a variety of network structures, e.g., recurrent neural network [3] (RNN) or convolutional neural network [4] (CNN), which provides better learning ability. Several studies have applied deep learning methods to various environments to improve the security of ICS networks [5]. However, there are still some challenges for the intrusion detection models of industrial control networks. First, the proportions of normal and attack samples available to researchers are often imbalanced. Although network attacks against ICSs occur at all times, the application of traditional security protection technologies reduces the proportion of attacks that can invade the system causing traffic changes. Compared with attack traffic, normal traffic still accounts for the vast majority, and classification results of deep learning tend to favor categories with larger proportions. In addition, there are limited types of attacks available to researchers. Attackers can constantly change the way they attack, but security personnel cannot anticipate all types of attacks in advance. That is, there will always be unexpected and unknown attacks. However, it is difficult to accurately classify unknown attacks using supervised deep learning. Secondly, labeling all data samples manually is impractical and self-contradictory. Most data samples available to researchers are labeled-free. Finally, the collected data often contain a lot of redundant and useless information for intrusion detection [6], which will affect the performance and results of intrusion detection. These problems result in a high false positive rate for most existing methods.

In order to solve the aforementioned challenge of data sources in ICS intrusion detection and improve the security of ICS networks, we propose a new intrusion detection model based on the one-dimensional convolutional autoencoder and SVDD [7] (1DCAE-SVDD) to identify attack behavior in ICS. Specifically, the deep one-dimensional convolutional autoencoder (1DCAE) filters the redundant features to acquire a low-dimensional representation of the original data that retains the key features for intrusion detection. Based on the extracted features, the traditional anomaly detection classifier SVDD is utilized to detect attack behavior. The major contributions of this work are summarized as follows:

This work proposes a joint optimization framework combining 1DCAE and SVDD to ensure the relevance of feature representation and intrusion detection and remove restrictions of deep SVDD.
Unlike the existing work, this paper proposes a three-stage optimization procedure to obtain better detection accuracy of 1DCAE-SVDD.
In the whole training process, only normal traffic samples are utilized in an unsupervised manner to solve the challenge of the lack of attack samples in the actual industrial control network.
The ablation experiments and comparative experiments on an industrial control network benchmark dataset demonstrate the proposed method can effectively detect unknown attacks.

2. Related Work

Machine learning technology can learn the characteristic patterns of network traffic without human intervention and has been applied in many intrusion detection scenarios. Ajaeiya et al. [8] proposed a lightweight flow-based IDS in a software-defined network (SDN) environment that extracts and aggregates a set of features to train a supervised classification model. Doshi et al. [9] employed a variety of machine learning algorithms to detect the distributed denial of service (DDoS) attacks in Internet of Things (IoT). They exploited specific network behaviors in feature selection and achieved high accuracy DDoS detection. Rathore et al. [10] proposed a real-time IDS for the high-speed environment, which utilizes FSR and BER techniques to select nine features from the intrusion dataset and employs a decision-tree-based classification model. Jiang et al. [11] combined particle swarm optimization (PSO) and Xgboost. They constructed a classification model based on Xgboost, and the optimal structure of Xgboost was adaptively searched by PSO.

Recently, deep learning technology has been widely used in intrusion detection research. Yang et al. [12] introduced an intrusion detection method based on an improved convolutional neural network (ICNN) in wireless network environments. They used CNN autonomously to extract high-level intrusion traffic data from the characterized and preprocessed original network traffic and optimized model by stochastic gradient descent algorithm. Nguyen et al. [13] proposed a NID algorithm, which employs a genetic algorithm (GA) and fuzzy C-means clustering (FCM) for selecting feature subsets. The model applies the bagging (BG) classifier and a CNN model. The deep feature representation extracted by the CNN model is fed into the Subsequent BG classifier to identify anomalous. Li et al. [14] designed intrusion detection approach based on multi-convolutional neural network (multi-CNN) fusion method, using the flow data visualization technology. The scheme divides the feature data into four parts with correlations and converts the one-dimensional feature data into a grayscale graph. The best emerges among the four subsets.

Considering that the information traces generated by attacks are usually time-dependent, a plethora of studies have developed recurrent neural network (RNN) IDS models. Yin et al. [15] first proposed an intrusion detection approach using recurrent neural networks (RNN-IDS), providing a new research direction in intrusion detection. Tang et al. [16] applied a gated recurrent unit recurrent neural network (GRU-RNN) for intrusion detection systems in SDN. In order to solve the security problem existing in processing big data in industrial Internet of Things, Zhou et al. [17] proposed variational long short-term memory (VLSTM), which employs LSTM cells to extract low-dimensional feature representations from raw data. Then, the refined features are fed to a lightweight classification network to identify attacks in an industrial network. Su et al. [18] proposed a traffic intrusion detection model, BAT. The model consists of bidirectional long short-term memory (BLSTM), an attention mechanism and multiple convolutional layers. The attention mechanism is utilized to monitor the network traffic flow consisting of packet vectors generated by the BLSTM model to extract the key features for network intrusion detection. Additionally, the multiple convolutional layers were applied to capture the local features of traffic data.

Redundant information in raw data features can result in poor performance in deep learning. As a result, some feature selection technologies must be applied on the original dataset. In recent years, some studies focused on autoencoder for constructing IDS models, which can automatically extract key features for intrusion detection. Wang et al. [19] proposed a hybrid model based on a deep autoencoder and a convolutional neural network for malware detection of Android applications (APP). They reconstructed the original features of the APP and employed an autoencoder for pretraining the CNN. Azmin et al. [20] intruduced an intrusion scheme combining a variational laplace autoencoder (VLAE) and a deep neural network. They enhanced the existing VLAE by feeding class labels to the autoencoder and named it the conditional variational laplace autoencoder (CVLAE). The model applies the CVLAE to learn latent variable representations of traffic’s raw features and trains a DNN as the classifier. Tang et al. [21] developed an intrusion detection model called SAAE-DNN combining a stacked autoencoder (SAE), an attention mechanism and a deep neural network (DNN). They applied the attention mechanism to consider the redundant features extracted by SAE from raw traffic data. The DNN is also utilized for classifying. Another set of previous studies integrated the autoencoder and conventional classifier to benefit from the advantages of both shallow and deep learning methods. Binbusayyis et al. [22] combined the one-class support vector machine (OCSVM) with a one-dimensional convolutional autoencoder. The scheme applies the one-dimensional convolutional autoencoder for unsupervised feature learning. The experiments in the NSL-KDD dataset demonstrate that their IDS scheme can improve detection accuracy and reduce training time. Ji et al. [23] proposed an anomaly detection scheme applying an asymmetric convolutional autoencoder for the learning of features and random forest classification. Hu et al. [6] proposed a one-class intrusion detection scheme (DO-IDS) to detect the attacks of industrial networks. The scheme applies an autoencoder with LSTM to map data from original input space to feature space and utilizes support vector data description to identify attacks. In addition, some studies utilize reconstruction errors for intrusion detection. The idea of using this technique in this way is that the learned latent space is forced to learn the generality of the input to minimize reconstruction errors. Anomalies are difficult to reconstruct from the feature representation space with larger reconstruction errors than normal samples [24]. For instance, Yair et al. [25] proposed a novel network-based anomaly detection method for IoT environments (N-BaIoT). They extracted behavioral features of the network and applied a deep autoencoder to detect abnormal behavior from compromised IoT devices. The summary of reviewed papers is shown in Table 1.

3. Background

3.1. Autoencoder

The autoencoder (AE) proposed by [26] is unsupervised deep learning technology. The basic structure of an AE consists of a encoding network and a decoding network with multiple layers of neural networks. The encoding network transforms the high-dimensional input vector into the representation of potential space and realizes dimensional compression or feature extraction. The decoding network reconstructs the input vector as the output according to the potential space vector. The AE is used to obtain robust feature expression by comparing the input and output differences. By setting fewer neurons in the bottleneck layer than the input layer, the neural network is forced to learn the low dimensional representation of the original data to achieve dimension reduction. Between the input layer and the bottleneck layer is the encoding network. Given the m-dimensional input vector

X = (x_{1}, x_{2}, x_{3} . . ., x_{m})

, the task of the encoding network is to calculate the n-dimensional representation

H = (h_{1}, h_{2}, h_{3}, . . ., h_{m})

in a potential space as the high-level features of the input vector. The formula is as follows:

H = ϕ_{e n} (W X + b)

(1)

where

ϕ_{e n}

is an encoding network with parameters of W and b. Between the bottleneck layer and output layer is the encoding network. Given the n-dimensional representation

H = (h_{1}, h_{2}, h_{3}, . . ., h_{m})

, the decoding network reconstructs the input vector, formulated as follows:

\hat{X} = ϕ_{d e} (W X + b)

(2)

where

ϕ_{d e}

is the decoding network. The above equations represent the nonlinear functions. The W and b are the inherent parameters of the AE, which can be changed through the training process. Then, the loss function is calculated by the difference between input and output. The parameters of the whole network are trained by back-propagation to make the input and output equal to as much as possible. Therefore, the entire AE approximately learns an identity function

ϕ_{A E} = ϕ_{d e} (ϕ_{e n} (x)) = \hat{x}

, where

\hat{x}

is similar to the original input x.

3.2. Support Vector Data Description

Inspired by SVM, SVDD [7] is a typical one-class classification method, which mainly separates abnormal data by learning hypersphere boundaries with minimum volume. Specifically, the data sampled in the input space are mapped to the feature space by a mapping function, which is more compact than those in the input space and makes it easier to construct the hypersphere boundary. The mapping function can be classified as linear or nonlinear. The original data generally have nonlinear relations. However, the linear methods fail to form a spherical structure in the feature space, so the nonlinear mapping method is generally used. Then, the smallest hypersphere (represented by the center of the sphere c and the radius R) is learned in the feature space to contain the majority of mapped points. Normal samples will fall inside the hypersphere, and outliers outside. The process is described as follows:

Given the normal dataset

D_{n} = X_{1}, X_{2}, \dots, X_{n}

where

X_{i} \in R^{m}

,

i = 1, 2, \dots, n

, the SVDD optimization objective is given by:

\begin{matrix} m i n R^{2} + η \sum_{i = 1}^{n} ξ_{i} \\ s . t . | | ϕ (X_{i} - c) | | \leq R^{2} + ξ_{i} \\ ξ \geq 0 \\ \forall i = 1, 2, \dots, n \end{matrix}

(3)

where R is the radius of the sphere and c is the center.

ξ

are slack variables.

η

is a trade-off between the sphere volume and the penalties of modeling error. Introducing the Lagrange function leads to the dual problem, as follows:

\begin{matrix} m i n \sum_{i = 1}^{n} \sum_{j = 1}^{n} α_{i} α_{j} K (X_{i}, X_{j}) - \sum_{i = 1}^{n} α_{i} K (X_{i}, X_{j}) \\ s . t . 0 \leq α_{i} \leq η, \sum_{i = 1}^{n} α_{i} = 1 \end{matrix}

(4)

where

α_{i}

represents the Lagrange coefficients of

X_{i}

.

K (.)

is the kernel function employed to get the inner products satisfying the Mercer theorem. After solving the dual problem, the corresponding Lagrange coefficients of all samples can be obtained. Among all training samples, the samples whose Lagrange coefficients satisfy

0 \leq α_{i} \leq η

are called support vectors (SVs). Assuming the sample

X_{i}

belongs to a support vector, the calculation formulas for the spherical center and radius of the hypersphere are as follows:

\begin{matrix} c = \sum_{i = 1}^{n} α_{i} ϕ (x_{i}) \end{matrix}

(5)

\begin{matrix} R = \sqrt{K (X_{k}, X_{k}) - 2 \sum_{i = 1}^{n} α_{i} K (X_{k}, X_{i}) + \sum_{i = 1}^{n} \sum_{j = 1}^{n} α_{i} α_{j} K (X_{i}, X_{j})} \end{matrix}

(6)

where

X_{k}

is a support vector, and

K (X_{i}, X_{j}) = 〈 ϕ (X_{i}), ϕ (X_{j}) 〉

. For a test point

X_{t e s t}

, if the distance between the sample and the hypersphere center c is greater than radius R calculated by (6), it is deemed as anomalous.

4. Methodology

In this paper, we propose a novel deep learning model for the intrusion detection task of an industrial control network, as shown in Figure 1. The proposed model is composed of two parts. The 1DCAE is utilized to extract the characteristics of industrial normal traffic flow, and SVDD is applied as the intrusion detection classifier. The features extracted via the encoder of 1DCAE are transferred to subsequent SVDD to learn a hypersphere, and the decoder of 1DCAE handles these representations to reconstruct the input. The next sections describe the motivation and technical details of the two parts, respectively, and propose objective functions. Then, the complete detection process and training process of the model are presented.

4.1. Autoencoder for Feature Learning and Motivation Analysis

In recent years, AE has been introduced as a solution for intrusion detection in various environments [5]. The variants of the autoencoder have been applied in many studies. This work applies AE to extract key features. Therefore, this section first carries out the analysis on typical variants of the autoencoder and the selection of hidden layers, and explains the motivation for selecting 1DCAE.

The sparse AE [27] assumes that high-dimensional sparse features are good features, which is feasible in other fields. However, this work combines deep learning with a traditional classifier. The traditional classifier usually achieves poor efficiency and accuracy when processing massive high-dimensional data. Therefore, high-dimensional features play a negative role in the proposed model. The stacked AE obtains the deep feature expression of the original input by training multiple AE structures in stages. However, in the process of multiple training, AE excessively pursues dimension reduction and does not consider the subsequent processing of the extracted features, which cannot ensure that all the key information of intrusion detection is retained. The denoising AE adds noise by randomly changing some features to 0 at the input to extract more robust features. However, some attack samples are highly similar to normal samples. By eliminating some features, the distance between the two samples will be further narrowed, or it may be entirely removed. Variational AE [28] has been a widely recognized generative model in recent years. The main task of AE in this paper is to extract key features, whereas variational AE aims to learn potential distribution, which is irrelevant to the purpose of this paper. According to extensive experiments, there is little difference in the final result between a traditional AE structure and variational AE. In order to simplify the model structure, VAE structure is not used in this paper. Asymmetric structure is a common implementation method, which is not analyzed as a major variant in this paper.

In order to extract specific features, we also need to select a specific hidden layer network structure. The RNN (recurrent neural network) and its variants are widely used in many studies of industrial intrusion detection [5]. After experimental analysis, this paper believes the application of RNN is based on the following assumptions. Firstly, the information traces recorded by the attack are considered time-dependent. Some attacks, such as DoS (denial of service), can generate multiple connection records at different times, and there are temporal correlations between these records. In addition, the industrial control network is composed of specific protocols. Each protocol specifies a specific data packet receiving and sending sequence; i.e., the data packets of the network traffic have temporal characteristics. This paper utilizes the network traffic flow feature of normal samples, which has no temporal correlation. Although RNN is widely used in intrusion detection research [5], due to the above considerations, the proposed model does not integrate the RNN structure. Some studies use CNNs to extract local spatial features [5] between features, which are also based on a certain assumption: that adjacent features are correlated. For common datasets, which divide features into several categories, the adjacent features of each category are often semantically related. For 2D CNNs, the traffic features are converted into a 2D structure, and then the spatial features are extracted with a 2D convolution kernel. However, some completely unrelated features may appear in the same region after being converted into a two-dimensional picture. Extracting the correlations of these features will introduce redundant features to mislead the neural network and subsequent classifiers. The one-dimensional convolution will not cause this problem. The one-dimensional convolution kernel is performed from front to back along the feature vector. The features covered by each convolution kernel usually belong to the same category of features and have a potential correlation. The 1DCAE extracts more valuable feature representations, considering the relationships between the features which are more meaningful for intrusion detection. Considering the above reasons, we adopted 1DCAE to extract the key features of intrusion detection. The specific model structure and calculation process are described below.

Significantly, the structure of the AE has a great impact on the performance of subsequent classifiers. In this work, a series of experiments were carried out, and the AE network structure shown in the figure was finally determined. Each cube in the encoder part represents a combination of convolution, batch normalization, activation and pooling. The number of filters in each layer from the front to the back is halved in turn, and the size of convolution cores remains

1 \times 3

. This structure can eliminate redundant features and effectively reduce the dimensions of the feature map. Each convolution is followed by a max pooling layer with a size of

1 \times 2

to obtain the basic characteristics of network traffic and improve the feature expression ability. Then, the feature map is expanded into vector form. The full connection layer of the encoder integrates features and further reduces dimensions to enhance the generalization ability of the model. The decoder first uses the full connection layer to recover the dimensions of the expanded vector, and then applies the deconvolution and up-sampling corresponding to the encoder to recover the dimensions of the input data.

The deep learning architecture of 1DCAE consists of multiple hidden layers. The calculation process of obtaining hidden feature representations at each layer can be defined recursively, expressed as follows:

H^{l} = f (H^{l - 1} * W^{l} + b^{l})

(7)

where l is the number of layers.

H^{0}

represents the original input vector X.

H^{l}

is the output of AE.

This work applies mean squared error (MSE) as the reconstruction error of 1DCAE between the original network traffic input vector and the reconstructed network traffic output vector. Given some training samples

x_{s e t} = X_{1}, X_{2}, \dots, X_{n}

, the objective function of 1DCAE can be defined as:

m i n \frac{1}{n} \sum_{i = 1}^{n} {∥ \hat{X_{i}} - X_{i} ∥}^{2}

(8)

where

\hat{X_{i}}

donates the output of AE,

\hat{X_{i}} = ϕ_{A E} (X_{i})

.

4.2. The Object Function of SVDD in the Model

As mentioned above, SVDD aims to teach a hypersphere to contain as much training data as possible to distinguish normal and abnormal patterns. In order to utilize the advantages of deep learning in feature mapping, this work applied the DSVDD objective function proposed by Ruff [29]. Given the input vector X, let

ϕ_{m} (X; W)

represent the mapped data points. In order to minimize the volume of the hypersphere, the objective of DSVDD can be defined as:

m i n R^{2} + \frac{β}{n} \sum_{i = 1}^{n} m a x \{0, ∥ ϕ_{m} {(X; W) - c ∥}^{2} - R^{2}\} + \frac{λ}{2} \sum_{l = 1}^{L} {∥ W^{l} ∥}_{F}^{2}

(9)

where the second term is a penalty term for points outside the hypersphere. The last term is the regularizer. W represents the parameters of the mapping network.

β

and

λ

control the trade-off between the two terms. We fix c as the mean of the latent representations by employing an initial forward pass with some training data instances [29].

According to the code provided by Ruff, the percentage division method is used to select R considering the distances from all data points to the spherical center. This method is similar to the sampling operation when there is no gradient to be calculated. Therefore, the first term of R in the objective function will not affect the back-propagation process. We propose a simplified version of the objective function as:

m i n \frac{β}{n} \sum_{i = 1}^{n} m a x \{0, ∥ ϕ_{m} {(X; W) - c ∥}^{2} - R^{2}\} + \frac{λ}{2} \sum_{l = 1}^{L} {∥ W^{l} ∥}_{F}^{2}

(10)

We only use normal network samples. If the assumption that most of the data samples in the training set are normal samples is true, a simplified objective of SVDD can be defined as:

m i n \frac{1}{n} \sum_{i = 1}^{n} ∥ ϕ_{m} {(X; W) - c ∥}^{2} + \frac{λ}{2} \sum_{l = 1}^{L} {∥ W^{l} ∥}_{F}^{2}

(11)

where the first term penalizes the distance of every mapped point

ϕ_{m} (X; W)

from the center.

The two objective functions are, respectively, used to train the same network model. The experimental results representing the latter can usually obtain higher accuracy. However, the former has its own advantages. After the training, the former can automatically obtain a decision boundary. The latter can only obtain a set of abnormal scores, and we have to manually set the decision boundary according to experiments and prior information of test set. To obtain the best performance of model, the last objective function was used in the subsequent work of this paper.

4.3. Deep 1DCAE-SVDD Model

Notably, the ability of DSVDD to obtain good classification results depends on the proper training of the neural network [30]. Therefore, there are many strict restrictions on the training process and network structure of DSVDD. First, the spherical center vector C cannot be set to 0 and cannot be a free variable, which may mislead the neural network, resulting in

ϕ_{m} (X; W)

and

W = 0

. In addition, each neuron of the neural network cannot have bia items. Otherwise, the neural network obtains the result of

ϕ_{m} (X; W)

= constant and

W = 0

[30]. For similar reasons, neural networks cannot use bounded activation functions. If limitations are not met, all data samples may be mapped to the same point. Consequently, the learned R is equal to 0 and SVDD fails to divide normal and abnormal points, which is called hypersphere collapse. The cause of these limitations is that the optimization process only aims to minimize the volume of the hypersphere, not considering retaining the necessary data information for classification tasks.

The training process of the AE is similar to that of DSVDD. By specifying the dimensions of the bottleneck layer, AE can achieve a satisfactory trade-off between dimension reduction and feature extraction. However, AE can not guarantee that the bottleneck layer contains the necessary information for subsequent tasks [17]. Some information may not be important for reconstructing input, but can play an important role in subsequent tasks. According to the objective function and model structure, AE only focuses on reducing dimensions and reconstructing input, without prior information of subsequent intrusion detection tasks.

To solve the common problems of AE and DSVDD in intrusion detection applications, we created the solution of joint optimization of AE and DSVDD, called 1DCAE-SVDD. The main scheme is to construct a neural network structure with a joint optimization objective combining two objective functions. One objective is to compact the mapping data points as closely to the mapping center as possible and holds necessary information for intrusion detection tasks in the bottleneck layer. Another objective is to preserve the original information by reconstructing the network input. If the encoding network maps all data samples to the same point, the decoding network fails to reconstruct different input data samples, resulting in great reconstruction error, which effectively avoids the hypersphere collapse. Given original samples X, the 1DCAE-SVDD optimization objective is defined as:

m i n \frac{δ}{n} \sum_{i = 1}^{n} ∥ ϕ_{e n} (X_{i}; W_{e n}) {- c ∥}^{2} + \frac{η}{n} \sum_{i = 1}^{n} ∥ ϕ_{A E} (X_{i}; W_{A E}) - X_{i} ∥^{2} + \frac{λ}{2} \sum_{l = 1}^{L} {∥ W^{l} ∥}_{F}^{2}

(12)

where

ϕ_{e n}

is the encoding network of AE and

W_{A E}

is the parameter of the encoder.

ϕ_{A E}

is the AE network, and

W_{A E}

is the parameter of the AE.

δ

and

δ

control the trade-off between reconstruction error and the distance from a mapped point to the center c.

4.4. Intrusion Detection with 1DCAE-SVDD

In this work, the distance between a mapped point of the original data and the spherical center in the latent space is defined as the anomaly score of the network traffic sample vector. For a given test point

X = (x_{1}, x_{2}, \dots, x_{n})

, the anomaly score of X can be expressed as:

s (X) = ∥ ϕ_{e n} (X; W_{e n}) {- c ∥}^{2}

(13)

where

ϕ_{e n}

is the encoding network. c is the center of hypersphere, which is a trainable parameter. The higher the value of

s (X)

, the more likely network traffic sample X is to be attacked. The purpose of the proposed method is to construct a hypersphere containing most data sample points in the latent space. The hypersphere is defined by the sphere’s center and radius. The radius R is the classification boundary of anomaly scores. All the training samples are normal samples. Theoretically, all normal samples can be included if the radius R is the maximum value of the abnormal score. However, the training samples contain many outliers, it is deviating. The hypersphere formed by applying the maximum distance as the radius r has a lot of free space, leading to a large rate of false positives in the test set. Therefore, we think the proper balance between accuracy and sphere volume should be considered. We set R as the 93th percentile of anomaly scores of training set. The anomaly score calculated by the formula is compared with the radius R. If it is greater than R, the mapped point will fall outside the sphere, which is considered as attack traffic and vice versa, as shown in lines 21–25 of Algorithm 1.

Algorithm 1 The working procedure of 1DCAE-SVDD

Require: Training set

X_{s e t} = {X_{1}, X_{2}, . . ., X_{n}}

,

training epoch

a e E p o c h, s v d d E p o c h, j o i n t E p o c h

,

learning rate

μ_{A E}, μ_{s v d d}, μ_{j o i n t}

1: Initialize

ϕ_{e n}, ϕ_{d e}

with kaiming algorithm

2: for

i E p o c h < a e E p o c h

do

3:

h = ϕ_{e n} (X)

4:

\hat{X} = ϕ_{d e} (x)

5: Calculate

L_{A E}

with (8)

6:

ϕ_{A E} \Leftarrow ϕ_{A E} - μ_{A E} ▽_{ϕ_{A E}} L_{A E}

7: end for

8: Initialize the center c;

9: for

i E p o c h < s v d d E p o c h

do

10:

h = ϕ_{e n} (X)

11: Calculate

L_{S V D D}

with (11)

12:

ϕ_{e n} \Leftarrow ϕ_{e n} - μ_{e n} ▽_{ϕ_{e n}} L_{S V D D}

13: end for

14: for

i E p o c h < j o i n t E p o c h

do

15:

h = ϕ_{e n} (X)

16:

\hat{X} = ϕ_{d e} (x)

17: Calculate

L_{j o i n t}

with (12)

18:

ϕ_{A E} \Leftarrow ϕ_{A E} - μ_{j o i n t} ▽_{ϕ_{A E}} L_{j o i n t}

19: end for

20: return

ϕ_{A E}

and the centre of sphere c

21: Calculating the anomaly score of

s (X_{i})

of

X_{s e t}

with (13).

22:

R \Leftarrow 93_{t h}

percentile of the anomaly scores of all training samples.

23: Extracting the latent feature

H_{t e s t}

of each test samples with (1).

24: Obtaining the anomaly score

s (X_{t e s t})

of mapped point with (13).

25: Comparing

s (X_{t e s t})

with R.

4.5. Train Optimization

This section introduces the training process of the model. Only normal network traffic samples are utilized in the entire training process. In order to solve the problems of AE and DSVDD, this work introduces a joint optimization solution. However, there are some problems with applying the joint optimization framework.

The experiments represent that if the joint optimization method is used in the initial stage of training, the speed of loss reduction is slower than that of two-stage training, and the detection accuracy is not significantly improved after 100 epochs. We think that at the initial stage of training, the loss of a single objective decreases rapidly and the network parameters change dramatically. The multi-objective learning affects the training process of the encoding network simultaneously. A certain objective has a greater impact, interfering with the learning process of another objective. As a result, both objectives are not achieved, instead of benefiting from multiple objectives. Theoretically, the influences of AE and DSVDD on the encoding network can be balanced by adjusting the hyperparameters

δ

and

η

. However, the actual influence of reconstruction error consists of hyperparameters and the weight of the decoding network. The decoding network’s weight changes sharply at the beginning of training, and the hyperparameters can not effectively control the influence of reconstruction error on the encoding network. In order to obtain better detection accuracy of IDCAE-SVDD, the following three-stage optimization procedures were designed.

First, the 1DCAE is pre-trained by normal samples. Specifically, for each data sample, the reconstruction result

\hat{X}

can be obtained by (1) and (2), and the reconstruction objective is calculated by (8) to update the weights of encoding network and decoding network (lines 1–7). Then, we extract the encoding network of AE and combine it with the SVDD classifier (lines 8–13). For all data samples of each batch, the extracted result

\hat{X}

is obtained by (1). The mapped point is subsequently processed by SVDD, and the hypersphere loss is calculated by (11). Only the encoding network weight is updated in second stage. Next, in the joint training phase (lines 14–19), the mapped points are calculated by applying (1). The point is used to reconstruct the input vector and construct the hypersphere simultaneously. The joint objective loss can be obtained by (12) to update all the trainable parameters of the model. The working procedure of 1DCAE-SVDD is shown in Algorithm 1.

5. Experiments and Analysis

We conducted extensive experiments to evaluate the intrusion detection performance of the proposed model. All the experiments were implemented with Pytorch, Scikit-learn and Python 3. Additionally, the pytorch-ignite library was applied to calculate the various metrics. In addition, we developed the experiments in the Colab development environment, so others can reproduce our result with no cost.

5.1. Dataset

In order to conduct a fair comparison with other studies, the benchmark dataset of NSL-KDD [31] was used. NSL-KDD generated in 2009 is an improved version of the KDDCUP99 dataset and is widely used to evaluate the performances of intrusion detection algorithms. NSL-KDD may not perfectly represent the real traffic of an industrial control network, as there is the problem of outdated records. However, most literature in the field of intrusion detection still takes NSL-KDD as the main research object [32]. In order to allow researchers to compare the results of this work with others, we used NSL-KDD to evaluate the proposed method.

The dataset contains 125,973 training samples and 22,544 testing samples, which solves the inherent redundant records problems of the KDDCUP99. We utilized two subsets of NSL-KDD dataset, KDDtrain+ and KDDtest+, to train and evaluate the proposed model. Although the dataset contains four attack types and one normal traffic type, in our experiment, we relabeled them into two categories, normal and abnormal. Therefore, the KDDtrain+ dataset contains 67,343 normal traffic records and 58,630 attack samples. In this work, we only used the normal traffic samples of KDDTrain+ to train the model. Similarly, the KDDTest+ includes 9711 normal traffic records and 12,833 abnormal records. We did not conduct any division of the test dataset to ensure the authenticity and credibility of the evaluation results.

Each record in NSL-KDD consists of 1 class label and 41 features to characterize the network traffic flow, which contains 38 numeric features and 3 nominal features. In order to construct the input numeric matrix for the proposed model, we applied one-hot encoding to transform the nonnumeric features. The features of protocol, service and flag have 3, 70 and 11 types of attributes. After transformation, the input vector has 122-dimensional features.

Some numeric features in NSL-KDD, e.g., duration [0, 58329], have a very large scope between the maximum and minimum values, which are a burden on the training process. To eliminate the effect, min–max normalization was used to limit the range of values to range [0, 1].

5.2. Evaluation Metrics

We evaluate the performance of all the experiments in terms of accuracy, precision, recall, F1and FPR. To calculate the above metrics, we first introduce the following terms. TP denotes the number of attack samples correctly classified, TN denotes normal samples currently classified, FP denotes normal samples incorrectly classified as attack, FN denotes attack samples incorrectly classified as normal records. Based on these terms, the metrics are defined as follows.

Accuracy: It is used to describe the ratio of the correctness of the prediction to entire test set. Accuracy is defined as follows:

$A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}$

(14)
Precision: It means the percentage of correctly predicted attack samples over the predicted attack samples. Precision is defined as follows:

$P r e c i s i o n = \frac{T P}{T P + F P}$

(15)
Recall: It estimates the percentage of correctly predicted attack samples over the all attack samples. recall is defined as follows:

$R e c a l l = \frac{T P}{T P + F N}$

(16)
F1: It is the harmonic mean of precision and recall. F1 is defined as follows:

$F 1 = \frac{2 \times (P r e c i s i o n \times R e c a l l)}{P r e c i s i o n + R e c a l l}$

(17)
FPR: It represents the radio of the normal samples predicted as attack to all the normal samples in the test dataset. FPR is defined as follows:

$F P R = \frac{F P}{F P + T N}$

(18)

5.3. Ablation Analysis

In order to understand the contributions of different components to the whole system, three variants of proposed approach were tested.

DSVDD: This variant was designed by replacing the 1DCAE component with DAE to analyze the contribution of the 1DCAE to network traffic feature learning.
1DCAE: This version was constructed by removing the SVDD classifier to demonstrate the superiority of SVDD in intrusion detection. We used reconstruct error as the anomaly score for every sample.
1DCAE+SVDD: This version was employed to reveal the effect of the three-stage training algorithm on the detection result. In this scheme, the 1DCAE and SVDD were trained simultaneously from the beginning without pre-training.

For a fair comparison, all the experiments were implemented with the same parameters and environment. The detailed parameter list of the proposed model shown in Table 2. For DSVDD, the detection result depended on the feature extraction process, i.e., the structure of AE. We conducted extensive experiments on the AE of DSVDD. The best detection performance of DSVDD was acquired when the numbers of nodes in the layers of AE were set to (122, 128, 64, 32, 64, 128, 122). For the experiment of 1DCAE, some studies [22] replaced the classifier employed in their work with others, e.g., a softmax layer. However, this work only utilized normal samples in the training phase. The cross-entropy cannot be applied to calculate loss to train the model. In this work, the reconstruction error was utilized for classifying. In addition, all the three variants are score-based methods, which have to select a threshold for classifying normal and attack data. To obtain the best result for every method, we calculated the receiver operating characteristics (ROC) curve on the test dataset.

Table 3 shows the comparison results. We can observe that DSVDD induced the lowest accuracy. This illustrates that the DAE cannot extract effective feature representations for intrusion detection from raw features, and SVDD cannot achieve better classification with the learned features. Similarly, the performance of 1DCAE without SVDD shows a more obvious decline. This clearly demonstrates that the SVDD classification method is an important component that significantly improves classification accuracy. To explore the performance of the joint training optimization algorithm, we jointly trained the 1DCAE+SVDD under different training epochs of the 1DCAE, and the F1-score results are shown in Figure 2. We can see that the F1-score of joint training has barely budged. This is because applying the joint objective contributes little to balancing the reconstruction and classification.

5.4. Performance Analysis

This set of experiments studied the effects of hyperparameters on intrusion detection performance, such as the number of convolutional layers, fully connected layers, filters and neurons, exploring the best design of 1DCAE for SVDD. In all experiments, the learning rate was set to 0.001, and the batch size was 128.

Firstly, a range of experiments were conducted to study the performance of the proposed model with convolutional layer diversity. The number of filters of the first convolutional layer was set to 16 and reduced by half for each subsequent convolutional layer. The decoder employs a symmetrical structure for the encoder. Table 4 clearly reveals the impact of the number of convolutional layers on the performance. The three-layer convolutional encoder delivers approximately the same accuracy as two layers. Although the improvement is not obvious, the convolution and pooling operations are not time-consuming. As a result, in the further experiments, the 1DCAE network employed a three-layer structure.

Then, a series of experiments were designed to determine the number of fully connected layers. Considering the experimental results of Section 5.3, the number of layers was set to [1, 3], and the numbers of neurons were (128, 64, 32), (64, 32) and (32). Table 5, summarizing the corresponding results, indicates that one layer displays the highest F1-score. In addition, multiple fully connected layers led to higher computing costs.

Next, the literature claims that the number of filters has a notable effect on performance [33]. In this set of experiments, the number of filters in the first layer was set to 32, 16, 8 or. The number of filters was reduced by half for each experiment. As shown in Table 6, the F1-score reached 93.41% when the first layer contained 16 filters.

5.5. Comparative Analysis

To demonstrate the outstanding intrusion detection performance of the proposed 1DCAE-SVDD, we conducted comparisons with recent and relevant studies from the literature on intrusion detection in terms of accuracy, precision, recall, F1-score and FPR. We do not compare it to conventional methods, e.g., local outlier factor (LOF) and PCA, because of their poor performance in intrusion detection. In addition, since the proposed method benefits from deep learning, we mainly focus on the deep methods for intrusion detection. Some studies mix the training set and test set of datasets and repartition them. For NSL-KDD, applying this method can reduce the difficulty of classification, since there are some unknown attacks in the test set which do not appear in training set. We do not compare our method with such methods. Additionally, the results obtained in the published papers are compared, and the results of comparison are displayed in Table 7. The best experimental metrics are shown in bold. We can observe in the table that the proposed achieved an F1-score of 93.53%, 92.74% accuracy, 94.91% precision and 92.19% recall. Accuracy is the model’s classification ability, and the proposed method had the highest classification accuracy. The F1-score is one of the most important metrics for an intrusion detecting model.

To sum up, it can be concluded that the excellent results of the proposed model demonstrate that it can efficiently detect attack behavior for intrusion detection.

6. Discussion

In this article, we proposed a new intrusion detection model to solve the problem of imbalanced samples in industrial control systems. This model consists of 1DCAE and SVDD. 1DCAE extracts the hidden features of the original network traffic data and maps the original instances to the feature space. SVDD constructs a hypersphere in the feature-space-containing mapped points. For the two-stage training process, as 1DCAE fails to retain the key features for intrusion detection and SVDD has to add restrictions, a joint optimization solution is used. To the best of our knowledge, the proposed method is the first to integrate 1DCAE and SVDD. A three-stage optimization process was proposed to solve the problem of multi-objective interference. The effectiveness of the algorithm was demonstrated on the open intrusion detection benchmark dataset NSL-KDD in terms of accuracy, FAR, recall, precision and F1-score. Ablation experiments showed that the 1DCAE structure can effectively extract the key features, and SVDD has higher detection performance than reconstruction-based methods. The three-stage optimization method performs better than joint optimization. It can be concluded with the comparative experimental results that although the false positive rate is not optimal, the accuracy and F1-score of the proposed method are better than those of many state-of-art models. In conclusion, the proposed method shows considerable advantages in intrusion detection, and may become a reference for practitioners. This work is related to an actual industrial control network project. We plan to deploy the detection model in the cloud–edge production environment. In the future, it may be necessary to consider the methods of federated learning and distributed training.

Author Contributions

Conceptualization, J.W. and P.L.; methodology, P.L.; software, R.A.; validation, W.K., J.W. and R.A.; formal analysis, P.L.; investigation, P.L.; resources, J.W.; data curation, W.K.; writing—original draft preparation, P.L.; writing—review and editing, W.K.; visualization, R.A.; supervision, J.W.; project administration, J.W.; funding acquisition, J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by National Key Research and Development Project (Key Technologies and Applications of Security and Trusted Industrial Control System 2020YFB2009500).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Acknowledgments

The authors thank all anonymous reviewers and editors for their work to improve this paper.

Conflicts of Interest

The authors declare no conflict to interest.

References

Zarpelão, B.B.; Miani, R.S.; Kawakani, C.T.; de Alvarenga, S.C. A survey of intrusion detection in Internet of Things. J. Netw. Comput. Appl. 2017, 84, 25–37. [Google Scholar] [CrossRef]
Soni, V.D. Challenges and Solution for Artificial Intelligence in Cybersecurity of the USA (June 10, 2020). Available online: https://ssrn.com/abstract=3624487 (accessed on 13 June 2022). [CrossRef]
Mikolov, T.; Karafiát, M.; Burget, L.; Cernockỳ, J.; Khudanpur, S. Recurrent neural network based language model. In Proceedings of the Interspeech, Makuhari, Japan, 26–30 September 2010; Volume 2, pp. 1045–1048. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [PubMed]
Lee, S.W.; Mohammed, H.; Mohammadi, M.; Rashidi, S.; Rahmani, A.M.; Masdari, M.; Hosseinzadeh, M. Towards secure intrusion detection systems using deep learning techniques: Comprehensive analysis and review. J. Netw. Comput. Appl. 2021, 187, 103111. [Google Scholar] [CrossRef]
Hu, B.; Bi, Y.; Zhi, M.; Zhang, K.; Yan, F.; Zhang, Q.; Liu, Z. A Deep One-Class Intrusion Detection Scheme in Software-Defined Industrial Networks. IEEE Trans. Ind. Inform. 2021, 18, 4286–4296. [Google Scholar] [CrossRef]
Tax, D.M.; Duin, R.P. Support vector data description. Mach. Learn. 2004, 54, 45–66. [Google Scholar] [CrossRef]
Ajaeiya, G.A.; Adalian, N.; Elhajj, I.H.; Kayssi, A.; Chehab, A. Flow-based intrusion detection system for SDN. In Proceedings of the 2017 IEEE Symposium on Computers and Communications (ISCC), Heraklion, Greece, 3–6 July 2017; pp. 787–793. [Google Scholar]
Doshi, R.; Apthorpe, N.; Feamster, N. Machine learning ddos detection for consumer internet of things devices. In Proceedings of the 2018 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA, 24 May 2018; pp. 29–35. [Google Scholar]
Rathore, M.M.; Saeed, F.; Rehman, A.; Paul, A.; Daniel, A. Intrusion detection using decision tree model in high-speed environment. In Proceedings of the 2018 International Conference on Soft-computing and Network Security (ICSNS), Coimbatore, India, 14–16 February 2018; pp. 1–4. [Google Scholar]
Jiang, H.; He, Z.; Ye, G.; Zhang, H. Network intrusion detection based on PSO-XGBoost model. IEEE Access 2020, 8, 58392–58401. [Google Scholar] [CrossRef]
Yang, H.; Wang, F. Wireless network intrusion detection based on improved convolutional neural network. IEEE Access 2019, 7, 64366–64374. [Google Scholar] [CrossRef]
Nguyen, M.T.; Kim, K. Genetic convolutional neural network for intrusion detection systems. Future Gener. Comput. Syst. 2020, 113, 418–427. [Google Scholar] [CrossRef]
Li, Y.; Xu, Y.; Liu, Z.; Hou, H.; Zheng, Y.; Xin, Y.; Zhao, Y.; Cui, L. Robust detection for network intrusion of industrial IoT based on multi-CNN fusion. Measurement 2020, 154, 107450. [Google Scholar] [CrossRef]
Yin, C.; Zhu, Y.; Fei, J.; He, X. A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access 2017, 5, 21954–21961. [Google Scholar] [CrossRef]
Tang, T.A.; Mhamdi, L.; McLernon, D.; Zaidi, S.A.R.; Ghogho, M. Deep recurrent neural network for intrusion detection in sdn-based networks. In Proceedings of the 2018 4th IEEE Conference on Network Softwarization and Workshops (NetSoft), Montreal, QC, Canada, 25–29 June 2018; pp. 202–206. [Google Scholar]
Zhou, X.; Hu, Y.; Liang, W.; Ma, J.; Jin, Q. Variational LSTM enhanced anomaly detection for industrial big data. IEEE Trans. Ind. Inform. 2020, 17, 3469–3477. [Google Scholar] [CrossRef]
Su, T.; Sun, H.; Zhu, J.; Wang, S.; Li, Y. BAT: Deep learning methods on network intrusion detection using NSL-KDD dataset. IEEE Access 2020, 8, 29575–29585. [Google Scholar] [CrossRef]
Wang, W.; Zhao, M.; Wang, J. Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J. Ambient Intell. Humaniz. Comput. 2019, 10, 3035–3043. [Google Scholar] [CrossRef]
Azmin, S.; Islam, A.M.A.A. Network Intrusion Detection System based on Conditional Variational Laplace Auto Encoder. In Proceedings of the 7th International Conference on Networking, Systems and Security, Dhaka, Bangladesh, 22–24 December 2020; pp. 82–88. [Google Scholar]
Tang, C.; Luktarhan, N.; Zhao, Y. SAAE-DNN: Deep learning method on intrusion detection. Symmetry 2020, 12, 1695. [Google Scholar] [CrossRef]
Binbusayyis, A.; Vaiyapuri, T. Unsupervised deep learning approach for network intrusion detection combining convolutional autoencoder and one-class SVM. Appl. Intell. 2021, 51, 7094–7108. [Google Scholar] [CrossRef]
Ji, S.; Ye, K.; Xu, C.Z. A Network Intrusion Detection Approach Based on Asymmetric Convolutional Autoencoder. In Proceedings of the International Conference on Cloud Computing, Virtual, 18–20 September 2020; pp. 126–140. [Google Scholar]
Pang, G.; Shen, C.; Cao, L.; Hengel, A.V.D. Deep learning for anomaly detection: A review. ACM Comput. Surv. (CSUR) 2021, 54, 1–38. [Google Scholar] [CrossRef]
Meidan, Y.; Bohadana, M.; Mathov, Y.; Mirsky, Y.; Shabtai, A.; Breitenbacher, D.; Elovici, Y. N-baiot—Network-based detection of iot botnet attacks using deep autoencoders. IEEE Pervasive Comput. 2018, 17, 12–22. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Ng, A. Sparse autoencoder. CS294A Lect. Notes 2011, 72, 1–19. [Google Scholar]
Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
Ruff, L.; Vandermeulen, R.; Goernitz, N.; Deecke, L.; Siddiqui, S.A.; Binder, A.; Müller, E.; Kloft, M. Deep one-class classification. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 4393–4402. [Google Scholar]
Zhang, Z.; Deng, X. Anomaly detection using improved deep SVDD model with data structure preservation. Pattern Recognit. Lett. 2021, 148, 1–6. [Google Scholar] [CrossRef]
Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009; pp. 1–6. [Google Scholar]
Chou, D.; Jiang, M. A survey on data-driven network intrusion detection. ACM Comput. Surv. (CSUR) 2021, 54, 1–36. [Google Scholar] [CrossRef]
Agrawal, A.; Mittal, N. Using CNN for facial expression recognition: A study of the effects of kernel size and number of filters on accuracy. Vis. Comput. 2020, 36, 405–412. [Google Scholar] [CrossRef]
Yang, Y.; Zheng, K.; Wu, C.; Yang, Y. Improving the classification effectiveness of intrusion detection by using improved conditional variational autoencoder and deep neural network. Sensors 2019, 19, 2528. [Google Scholar] [CrossRef]
Yang, Y.; Zheng, K.; Wu, B.; Yang, Y.; Wang, X. Network intrusion detection based on supervised adversarial variational auto-encoder with regularization. IEEE Access 2020, 8, 42169–42184. [Google Scholar] [CrossRef]
Vinayakumar, R.; Alazab, M.; Soman, K.; Poornachandran, P.; Al-Nemrat, A.; Venkatraman, S. Deep learning approach for intelligent intrusion detection system. IEEE Access 2019, 7, 41525–41550. [Google Scholar] [CrossRef]
Yang, Y.; Zheng, K.; Wu, C.; Niu, X.; Yang, Y. Building an effective intrusion detection system using the modified density peak clustering algorithm and deep belief networks. Appl. Sci. 2019, 9, 238. [Google Scholar] [CrossRef]
Yu, Y.; Bian, N. An intrusion detection method using few-shot learning. IEEE Access 2020, 8, 49730–49740. [Google Scholar] [CrossRef]

Figure 1. The architecture of the proposed 1DCAE-SVDD.

Figure 2. F1-score varies with training epochs for the 1DCAE.

Table 1. Summary of reviewed papers in the field.

Scheme	Year	Classifiers	Feature Extractions	Datasets
Nguyen et al. [13]	2020	BG, CNN	GA, FCM	NSL-KDD
Li et al. [14]	2020	MLP	multi-CNN fusion	NSL-KDD
Zhou et al. [17]	2020	MLP	VLSTM	UNSW-NB15
Su et al. [18]	2020	Softmax	CNN, BLSTM, Attention	NSL-KDD
Azmin et al. [20]	2020	MLP	Conditional Variational Laplace AutoEncoder	NSL-KDD
Tang et al. [21]	2020	DNN	stacked autoencoder, attention	NSL-KDD
Binbusayyis et al. [22]	2021	one-class SVM	convolutional autoencoder	NSL-KDD, UNSW-NB15
Hu et al. [6]	2021	support vector data description	autoencoder	NSL-KDD

Table 2. Detailed parameter list for the proposed model.

Layer	Output Shape	Param
Conv1d + lrelu	[16, 122]	64
Maxpool1d	[16, 61]	0
Conv1d + lrelu	[8, 61]	392
Maxpool1d	[8, 30]	0
Conv1d + lrelu	[4, 30]	100
Maxpool1d	[4, 15]	0
Linear	[32]	1952
Linear	[60]	1980
ConvTransposed1d + lrelu	[4, 15]	52
ConvTransposed1d + lrelu	[8, 30]	104
ConvTransposed1d + lrelu	[16, 60]	400
ConvTransposed1d + lrelu	[1, 122]	49

Table 3. Ablation analysis on NSL-KDD for different variants.

Layer	Accuracy	FPR	Recall	Precision	F1-score
DSVDD	91.00	16.70	96.83	88.45	92.45
1DCAE	91.18	14.6	95.56	89.63	92.50
1DCAE+SVDD	91.33	12.98	94.59	90.59	92.55
Proposed	92.74	6.52	92.19	94.91	93.53

Table 4. Performance analysis on NSL-KDD for different numbers of convolutional layers.

Layer	Accuracy	FPR	Recall	Precision	F1-Score
1	87.17	14.01	88.07	89.25	88.66
2	90.77	17.95	97.38	87.75	92.32
3	92.25	13.5	96.60	90.43	93.41

Table 5. Performance analysis on NSL-KDD for different numbers of fully connected layers.

Layer	Accuracy	FPR	Recall	Precision	F1-Score
32	92.25	13.5	96.60	90.43	93.41
64, 32	91.77	7.8	91.45	93.93	92.67
128, 64, 32	91.09	15.73	96.25	88.99	92.48

Table 6. Performance analysis on NSL-KDD for different numbers of filters.

Layer	Accuracy	FPR	Recall	Precision	F1-Score
32	90.95	18.15	97.84	87.68	92.49
16	92.25	13.5	96.60	90.43	93.41
8	90.25	12.94	92.68	90.44	91.54

Table 7. Comparative analysis with some state-of-art methods on NSL-KDD.

	Accuracy	FPR	Recall	Precision	F1-Score
ICVAE-DNN [34]	85.95	2.74	77.43	97.38	86.27
SAVAER [35]	89.36	4.70	95.98	84.86	90.08
DNN with one Layer [36]	80.1	-	96.90	73.63	80.7
MDPCA+DBN [37]	82.08	2.62	70.51	97.25	81.75
SAAE-DNN [21]	87.74	-	84.18	86.47	85.3
Few shot learning [38]	92.34	7.21	92.25	92.30	92.26
CAE+OCSVM [22]	91.58	2.43	97.11	88.98	92.87
Proposed 1DCAE-SVDD	92.74	6.52	92.19	94.91	93.53

- denotes that the metric is not available from the literature. The best experimental metrics are shown in bold.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, J.; Li, P.; Kong, W.; An, R. Unknown Security Attack Detection of Industrial Control System by Deep Learning. Mathematics 2022, 10, 2872. https://doi.org/10.3390/math10162872

AMA Style

Wang J, Li P, Kong W, An R. Unknown Security Attack Detection of Industrial Control System by Deep Learning. Mathematics. 2022; 10(16):2872. https://doi.org/10.3390/math10162872

Chicago/Turabian Style

Wang, Jie, Pengfei Li, Weiqiang Kong, and Ran An. 2022. "Unknown Security Attack Detection of Industrial Control System by Deep Learning" Mathematics 10, no. 16: 2872. https://doi.org/10.3390/math10162872

APA Style

Wang, J., Li, P., Kong, W., & An, R. (2022). Unknown Security Attack Detection of Industrial Control System by Deep Learning. Mathematics, 10(16), 2872. https://doi.org/10.3390/math10162872

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Unknown Security Attack Detection of Industrial Control System by Deep Learning

Abstract

1. Introduction

2. Related Work

3. Background

3.1. Autoencoder

3.2. Support Vector Data Description

4. Methodology

4.1. Autoencoder for Feature Learning and Motivation Analysis

4.2. The Object Function of SVDD in the Model

4.3. Deep 1DCAE-SVDD Model

4.4. Intrusion Detection with 1DCAE-SVDD

4.5. Train Optimization

5. Experiments and Analysis

5.1. Dataset

5.2. Evaluation Metrics

5.3. Ablation Analysis

5.4. Performance Analysis

5.5. Comparative Analysis

6. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI