A Network Traffic Intrusion Detection Method for Industrial Control Systems Based on Deep Learning

Kai Jin; Lei Zhang; Yujie Zhang; Duo Sun; Xiaoyuan Zheng

doi:10.3390/electronics12204329

,

and

School of Artifificial Intelligence and Data Science, Hebei University of Technology, Tianjin 300401, China

^*

Author to whom correspondence should be addressed.

Electronics2023, 12(20), 4329;https://doi.org/10.3390/electronics12204329

This article belongs to the Section Computer Science & Engineering

Version Notes

Order Reprints

Abstract

The current mainstream intrusion detection models often have a high false negative rate, significantly affecting intrusion detection systems’ (IDSs) practicability. To address this issue, we propose an intrusion detection model based on a multi-scale one-dimensional convolutional neural network module (MS1DCNN), an efficient channel attention module (ECA), and two bidirectional long short-term memory modules (BiLSTMs). The proposed hybrid MS1DCNN-ECA-BiLSTM model uses the MS1DCNN module to extract features with a different granularity from the input data and uses the ECA module to enhance the weight of important features. Finally, the model carries out sequence learning through two BiLSTM layers. We use the dung beetle optimizer (DBO) to optimize the hyperparameters in the model to obtain better classification results. Additionally, we use the synthetic minority oversampling technique (SMOTE) to fill several samples to reduce the local false negative rate. In this paper, we train and test the model using accurate network data from a water storage industrial control system. In the multi-classification experiment, the model’s accuracy was 97.04%, the precision was 97.17%, and the false negative rate was 2.95%; in the binary classification experiment, the accuracy and false negative rate were 99.30% and 0.7%. Compared with other mainstream methods, our model has a higher score. This study provides a new algorithm for the intrusion detection of industrial control systems.

Keywords:

industrial control system; intrusion detection; CNN; LSTM; parameter optimization

1. Introduction

1.1. Background

The industrial control system (ICS) is an essential component of the industrial world’s development [1], as it is integrated into various industrial infrastructures, such as electricity, energy, chemicals, and oil and gas, to ensure the smooth operation of industrial processes and promote the integrated development of various industries [2]. However, with the digital age and the continuous development of Industry 4.0 [3,4], the number of attacks on industrial control networks has increased yearly. These attacks have affected various systems, including manufacturing, transportation, electric power, and telecommunications [5]. Unfortunately, many industrial control security incidents have occurred recently, causing significant losses for many enterprises [6]. These attacks not only paralyze and destroy industrial equipment but also significantly impact industrial production, disrupting the ordinary lives of residents. It is crucial to enhance the security of ICSs to prevent similar incidents from happening again in the future. These attacks are constantly evolving, hidden in the regular communication traffic between devices, and are difficult to find. In contrast, intrusion detection systems (IDSs) have emerged [7]. An IDS is mainly used to identify malicious behaviors in the network, thus protecting the security and integrity of the system, and it is an essential part of ICSs.

1.2. Related Work

The security of industrial control system networks has been a focus of researchers worldwide, with the IDS being a key area of research. In recent years, scholars have combined intrusion detection technology with current mainstream algorithms to explore new research directions. They have successfully applied mature intrusion detection technologies such as machine learning, data mining, and deep learning to the intrusion detection of industrial control networks. Rapid machine and deep learning development has led to the introduction of related models. Table 1 shows some commonly used intrusion detection methods.

Table 1. Comparison of related methods.

Reducing the number of features can improve the efficiency of intrusion detection. PL Indrasiri et al. used principal component analysis (PCA) to reduce the features of the mixed dataset to 30. PCA is a common statistical algorithm, which is often used to reduce the feature dimension. It transforms the highly correlated attributes in the dataset into linearly uncorrelated attributes and then creates a feature subspace [8]. Similarly, Furqan Rustam et al. used a feature selection method that combines PCA and singular value decomposition (SVD). The addition of SVD improves the ability of feature selection, which greatly reduces the redundancy of data [9]. Ankit Thakkar and others designed a feature selection method based on standard deviation, average value, and median difference. They used the rank derived from the fusion of the standard deviation and the difference between the mean and the median to calculate the combined feature rank and recursively add features to the feature subset based on the combined feature rank until the accuracy was no longer better than that of the previously derived feature subset [10].

Researchers often use recurrent architectures for intrusion detection, including recurrent neural network (RNN), gate recurrent unit (GRU), and long short-term memory (LSTM). Kasongo proposed an intrusion detection framework based on machine learning [11]. In their research, XGboost was used for feature selection to reduce the feature dimension of datasets. Within this framework, they used multiple recurrent neural networks for experiments. Selecting RNN as a temporary memory of the previous state and applying it to the current calculation is beneficial for learning the association between features. The combination of XGboost, RNN, and its variants has good scores on UNSW-NB 15 and NSL-KDD. Similarly, Earum Mushtaq et al. proposed a hybrid model composed of auto encoder (AE) and LSTM [12]. LSTM refers to a mechanism called “gate” to capture and manage long-term dependencies. Compared with RNN, LSTM can handle long sequences better and avoid the problem of gradient explosion. Furqan Rustam et al. designed an integrated deep voting classifier, including LSTM, RNN, and GRU [13]. The final prediction of the model is made by voting to aggregate individual model predictions. This voting mechanism can effectively improve the accuracy. In order to lighten the model, Ling et al. designed a simpler RNN module called UGR [14]. The new model using skip connection can improve the accuracy and reduce the time cost, making the model more suitable for practical detection. Vanlalruata Hnamte et al. designed a hybrid deep learning model: DCNNBiLSTM [15]. They used convolutional neural networks (CNNs) and BiLSTM to extract the spatial and temporal dependence of features, which improved the robustness of the model.

1.3. Main Contributions

The essential contributions of this research are as follows:

This research develops an efficient detection model named MS1DCNN-ECA-BiLSTM, which is mainly used to identify the attack network traffic in an ICS.
We designed the MS1DCNN module to extract features with different granularity, and used the ECA module to assign weights to the features, which enhanced the perception ability of the model.
This study utilized the DBO algorithm to optimize the main parameters of the model and find the best position, and the optimized model has higher accuracy.
To improve the local detection ability, we used SMOTE to fill in unbalanced data, which improves the robustness of the model.

2. Materials and Methods

2.1. Applicability of the Method

In the task of intrusion detection, the traditional machine learning method focuses on the importance of learning features to find the best correlation between data points that seem to have the greatest influence on the final result, while completely ignoring the importance of correlation between features and considering the time step to predict the best possible result, which may lead to a higher false negative rate [16]. For ICS, once some attack traffic hidden in normal network traffic is missed, attackers may use the system’s loopholes to attack, invade the network, steal sensitive information, or damage critical infrastructure, thus causing damage to the organization. This goes against the original intention of studying intrusion detection. CNN and LSTM are excellent at dealing with complex intrusion detection problems [17]. They can learn feature representation from large-scale data and have high accuracy and generalization ability, effectively reducing the false negative rate. As seen from the above existing work, better learning the characteristics of input data is the key to improving the accuracy, and it also forms the critical research contents of this study. When dealing with network data, 1DCNN has better performance and less computational overhead [18]. To improve the feature extraction ability of 1DCNN, we add MS1DCNN and efficient channel attention (ECA). The former captures features of different scales through parallel convolution layers with different sizes. At the same time, the latter can increase the weight of essential elements, and their effects will be reflected in the subsequent experiments. LSTM can deal with variable-length sequences, which can read the data in the sequence step by step and keep and update the hidden state at each time step, thus adapting to input sequences with different lengths. This feature enables LSTM to flexibly handle network traffic or event sequences with varying distances in intrusion detection. BiLSTM is a variant of LSTM, which considers that two-way context can obtain past and future information simultaneously and fuse this information into feature representation. With the addition of BiLSTM, the model can better capture the long-range dependence in the sequence and provide more accurate feature expression ability, thus providing more precise classification ability.

With the continuous development of deep learning models, the depth and complexity of the model are also increasing. The setting of various model parameters significantly affects the effectiveness of intrusion detection [19], making parameter optimization a key factor to improve the accuracy of intrusion detection [20].

This article uses a model structure combining MS1DCNN, ECA, and BiLSTM as the basis and uses the dung beetle optimization algorithm to optimize the model’s hyperparameters. The relevant theories of these methods are introduced below.

2.2. One-Dimensional Convolutional Neural Networks

In binary and multivariate classification, a CNN is extensively used for local feature extraction [21]. A CNN with a filter can extract features from the input and help to learn the time dependence in multivariate time series input data. 1DCNN can effectively learn the characteristics of one-dimensional data. As shown in Figure 1, 1DCNN operates through a combination of convolutional layers, activation functions, pooling layers, and fully connected layers [22]. This architecture allows the 1DCNN to extract features from the input signal at different levels of abstraction, making it suitable for analyzing complex time series data.

Figure 1. Typical 1DCNN architecture.

Activation functions are used to introduce non-linear relationships to the outputs of convolution operations, thereby enhancing the network’s ability to classify data. Common activation functions include sigmoid, tanh, and ReLU.

The pooling layer is another essential operation in convolutional neural networks that extracts critical features from the convolutional layer’s output, improving the extracted features’ robustness. Its role is to reduce the dimensionality of the previous layer of data and remove redundant information. This reduces the number of parameters and calculations in the network, which is beneficial to suppress overfitting and improve the model’s generalization ability.

The fully connected layer is typically located at the end of the CNN and serves as the “classifier” in convolutional neural networks. Softmax is usually applied to the output layer of a multi-classification problem to ensure that the output values range from 0 to 1 and that all output values can be summed to 1. The output value represents the probability of the output, and the one with the highest probability is selected as the final classification result.

2.3. Bidirectional Long Short-Term Memory

LSTM is a critical branch of the RNN that improves upon the limitations of the RNN in processing time series data [23]. The key to LSTM is the memory block, as shown in Figure 2. It mainly consists of three gates (forget gate, input gate, output gate) and a memory cell unit.

Figure 2. LSTM cell structure.

However, the LSTM based on the single-chain structure can only learn the forward sequential relationship. BiLSTM optimizes the LSTM neural network by addressing the limitation of insufficient global information learning of historical data and ignoring the relevance of time before and after. As shown in Figure 3, its two-way structure enables it to learn the sequence’s forward and reverse timing relationship to thoroughly learn the input data’s characteristics. The forward LSTM layer processes the input sequence from the beginning to the end, while the backward LSTM layer processes the input sequence from the end to the beginning. The output of both layers is then concatenated to form the final output.

Figure 3. BiLSTM learning flow chart.

2.4. Efficient Channel Attention Module

Different from the continuous stacking of the CNN, the focus of research in recent years has been on enhancing the essential features of the CNN. ECA is an efficient channel attention learning mechanism that has recently been receiving extensive attention [24]. ECA uses a local cross-channel interaction strategy and adaptive selection of one-dimensional convolution kernel size to avoid the negative impact of dimensionality reduction by SEnet and other methods. As shown in Figure 4, the ECA module independently uses a global average pool and then uses the sigmoid function to generate channel weights. The channel weight is generated by performing a fast one-dimensional convolution with the size of k.

Figure 4. ECA structure.

The ECA attention mechanism calculates the attention weight of each channel by introducing a 1D convolution operation in the channel dimension. This convolution operation can be regarded as a local receptive field operation, which only considers the correlation of each channel in the local range, not the global range. In this way, the ECA attention mechanism can reduce computational complexity and improve computational efficiency.

2.5. Dung Beetle Optimizer

Optimization problems have been the research focus in various systems. The choice of optimization algorithms is directly related to the performance of the final model. The most important index of intrusion detection is accuracy, which determines whether the model can really protect an ICS; so, we hope to use a single-objective optimization algorithm to optimize the model. This paper uses a swarm intelligence optimization algorithm—the dung beetle optimization algorithm (DBO) [25]. The dung beetle optimization algorithm optimizes parameters by simulating the dung beetle’s behaviors of rolling a ball, dancing, foraging, stealing, and breeding. In [10], the DBO algorithm has higher scores than other algorithms, including convergence speed, accuracy, and stability. DBO has designed five different updating rules to help find high-quality solutions and thoroughly explore the search space by using information in different time periods. DBO has stronger search ability to avoid falling into local optimum, which is why we chose it.

2.6. Data Preprocessing Method

A.: Normalized

Data normalization is one of the preprocessing steps commonly used in machine learning and data analysis [26]. Its purpose is to eliminate the scale difference between features, so as to ensure that the influence of each feature on the model can be balanced. By eliminating the scale difference between features, normalization can improve the performance and accuracy of the model and accelerate the convergence speed of the model. In addition, normalization is helpful to improve the stability and robustness of the model and avoid the negative impact of outliers on the model.

This paper uses the max–min normalization method. The linear function used in this method converts the method of linearization of the original data to the range of [0, 1], and the calculation result is the normalized data.

x^{'} = \frac{x - \min (x)}{\max (x) - \min (x)}

(1)

Among them, x is the original data.

B.: Synthetic Minority Over-Sampling Technique

SMOTE is a common method used to address sample imbalance by increasing the number of minority class samples through the synthesis of new data [27]. This technique works by randomly selecting minority class samples and generating new synthetic samples based on the differences between them and their neighbors. The number of new samples generated can be controlled by setting a multiplier.

To produce synthetic samples, SMOTE calculates the Euclidean distance between each minority class sample and all other samples in the minority class set. It then determines the k nearest neighbors for each minority class sample. One must select a minority sample

x

as the reference sample.

x_{i}

is its i-th adjacent sample. For each neighbor

x_{i}

, a new composite sample

x_{n e w}

is computed.

x_{n e w} = x + λ \times (x_{i} - x)

(2)

λ = r a n d (0, 1)

(3)

where λ is a random number between 0 and 1, representing the distance between the new sample and the original sample. By interpolating each neighbor in this way, new synthetic samples can be generated. These new synthetic samples are added to the training set, which can increase the number of minority class samples, to achieve the purpose of balancing the sample distribution.

3. The Proposed Methodology

3.1. Algorithm Overview

In this paper, we propose a model that combines MS1DCNN, ECA, and BiLSTM, and use DBO to optimize the hyperparameters. In this paper, we optimize the learning rate, the dropout rate, and the number of hidden layer neurons in the network model.

3.2. Model Structure

In this paper, we build a 1DMCNN-ECA-BiLSTM model, as shown in Figure 5. The model starts with an input layer, followed by a 1DCNN layer to shorten the vector length and increase the number of channels. Our activation function is ReLU. The ReLU activation function is a simple yet effective function commonly used in neural networks. It has a straightforward calculation: if the input is greater than 0, it directly returns the value provided as input; if the input is 0 or less, the value 0 is returned. ReLU has sparsity, which helps to mine relevant features and fit training data better. Since the 1DCNN is prone to the problem of missing some features when extracting features, we design a multi-scale convolutional layer to gradually extract features of different granularities of the dataset by stacking network layers. In the multi-scale convolutional layer, we use different convolution sizes to extract features at different granularities of the dataset. This paper uses convolution kernel sizes of 3, 5, and 7. The convolution layer with a convolution kernel size of 3 can better capture the local features in the input data. The convolution layer with a convolution kernel size of 5 can expand the receptive field. The convolution layer with a convolution kernel size of 7 further expands the receptive field, so that the model can better understand the global structure and context of the input data. Three convolution layers were connected in parallel, and the inputs of the previous layers were convolved, respectively. The output tensors of all convolutions were concatenated in the channel dimension to obtain the final output tensor. The multi-scale convolution layer composed of three different convolution kernel sizes can fully extract the features of input data. Then, we used ECA to obtain the weights of each feature, which made the network pay more attention to the importance of each channel, thus reducing the number of redundant features and improving the utilization efficiency of features.

Figure 5. Model structure diagram.

In this model, we used two BiLSTM layers to better understand the correlation of long-range temporally correlated features. We added a max. pooling layer and a batch normalization layer between two BiLSTM layers to reduce the computational complexity of the model, speed up the training process, and alleviate the over-fitting problem. To further prevent over-fitting, we introduced a dropout layer during model training. The parameters of the dropout layer were determined using the optimization algorithm. Finally, in our fully connected layer, we used softmax as a classifier to classify the incoming traffic.

3.3. Parameter Optimization

The hyperparameters of the model were set using DBO optimization. The schematic diagram in Figure 6 helps to visualize how DBO can be applied.

Figure 6. Parameter optimization process diagram.

(1): First is the preprocessing of raw data. To achieve better training results and avoid the problem of gradient explosion during training, one must first preprocess the original data, fill in the null values in the original data, and then encode non-numeric features using label encoding. To prevent the existence of singular samples from affecting the convergence effect of the model, one must normalize and standardize the data, and use min–max normalization to keep all data between 0 and 1.
(2): Set the object. Select the optimized function. The optimized function refers to the target that needs to be optimized, which was our model. Then, select the parameters to be optimized; we optimized the learning rate, dropout, and the number of neurons in the hidden layer. The learning rate controls the magnitude of each parameter update, and its size determines how quickly the network learns. The dropout rate is used to solve the overfitting problem of neural networks. An appropriate dropout rate can improve the generalization ability of the model. The hidden layer unit refers to the memory unit or memory cell in BiLSTM. It determines the capacity and expressiveness of the BiLSTM network, as well as the model’s ability to model complex patterns and dependencies of the input sequence.
(3): Initialize the relevant parameters of the DBO optimization algorithm. These include population size P, the maximum number of iterations T, the number of parameters to be optimized, the upper and lower limits of parameters, and the optimized function. Where P represents the size of the collection of dung beetle individuals in the algorithm, T represents the number of iterations of optimization.
(4): Start training. First, initialize the population position. Position refers to the value of each parameter, which is randomly assigned within the constraints. Then, add the position parameters at this time into the algorithm model, and take the accuracy rate of the intrusion detection model as the evaluation standard of the current position. Update the dung beetle location, spawning location, foraging dung beetle location, and thief dung beetle location.
(5): After each round of iteration, find out the current optimal position and compare it with the historical optimal position to determine whether to update the historical optimal position and then enter the next round of iteration.
(6): After T iterations, input the parameters of the historical optimal position, namely the learning rate, dropout, and the number of neurons in the hidden layer, into the intrusion detection model as the final parameters of the model.

4. Experiments and Results

4.1. Experimental Environment

The experiment in this paper is based on Python 3.9, using Tensorflow as the deep learning framework, the operating system is Windows 11, the graphics card is NVIDIA GeForce RTX 3060, and the processor is Intel i5-12400F.

4.2. Evaluation Standard

The intrusion detection system is a multi-classification task system. Therefore, to evaluate the performance of our model, we selected common evaluation indicators for multi-classification problems to evaluate the model, including accuracy, precision, and recall. In addition, to evaluate the effect of the model more comprehensively, objectively, and accurately, we introduced the class evaluation index F1 score.

A c c u r a c y = \frac{T P + T N}{F N + F P + T P + T N}

(4)

R e c a l l = \frac{T P}{T P + F N}

(5)

P r e c i s i o n = \frac{T P}{T P + F P}

(6)

F 1 = \frac{2 * P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l}

(7)

Among them, TP means that the model predicts a positive sample as a positive sample, TN means that the model predicts a negative sample as a negative sample, FP means that the model predicts a negative sample as a positive sample, and FN means that the model predicts a positive sample as a negative sample.

4.3. Experimental Datasets and Preprocessing

The dataset used in this paper comes from the network flow of the water storage control system of a factory in Hebei Province. In this real environment, we carried out various attacks on the host computer, and used Wireshark software to collect the network traffic. As shown in Table 2, the dataset contains 92,272 network traffic data packets, including 57,693 normal network traffic data packets and 34,579 attack traffic data packets, including spoofing attacks, two detection attacks, DDoS attacks, and socket attacks. We set labels of 0, 1, 2, 3, 4, and 5 for the six categories containing normal traffic.

Table 2. Dataset description.

We preprocessed the dataset to encode the non-numeric features using encoding, and then normalized the dataset to keep all data between 0 and 1.

Due to the uneven distribution of various attacks on the attack data, we adopted the SMOTE method to oversample the unbalanced data. It can be seen from the above preprocessing that there are only 512 pieces of data for the detection of attack B, which is extremely rare compared to other attack data and normal data. Similarly, the detection of attack A also has an imbalanced distribution compared to other data. We filled in attacks A and B, so that the ratio of them to the most normal traffic was about 1 to 5, and it did not exceed other types. Finally, we decided to fill in the number of these two attacks as being 10,000. However, the proportion of the filled attack traffic was still much smaller than the normal traffic, but in the case of conforming to the traffic characteristics of ICS, doing so could already improve the effect of the intrusion detection model very well.

4.4. Experimental Results and Analysis

During the experiment, we first set the parameters of the DBO algorithm, and then set the number of populations to 20 and the maximum number of iterations to 20. We used the misclassification rate as fitness. In this experiment, we set the dropout rate, the learning rate, and the number of optimized BiLSTM hidden layer units. Since we used two BiLSTM layers, we set parameters for the number of hidden layer units of the two BiLSTM layers. In order to find the optimal parameters, we set the range of parameters as wide as possible. The lower limits of the ranges of all parameters were 0.1, 0.0001, 30, and 30, and the upper limits of the ranges were 0.8, 0.1, 300, and 300. The populated data were then fed into the model for iterative optimization.

As can be seen from Figure 7, with the increase in iterations, the fitness function gradually converges. After the 18th iteration, the fitness does not change. As shown in Table 3, the optimal parameters of 0.5, 0.001, 272, and 256 were obtained after optimization.

Figure 7. DBO experimental results chart.

Table 3. Parameter settings and results of optimization.

Next, the optimal parameters were brought into the main model MS1DCNN-ECA-BiLSTM for intrusion detection. The epoch and batch size we chose were 10 and 32, respectively. To verify the superiority of our method, we selected a variety of common intrusion detection methods to be compared with our model, including 1DCNN alone, MS1DCNN alone, BiLSTM alone, MS1DCNN-ECA, and the MS1DCNN-ECA-BiLSTM model without parameter optimization. To ensure the fairness of the experimental results, we repeated the experiment 10 times for each method and used the average experimental results to be compared with our method. Table 4 shows the results for each method.

Table 4. The experimental results of each method.

Table 4 shows the performance scores of our models on the test set. Among them, the accuracy rate reached 97.04%, the precision rate was 97.17%, the recall rate was 97.05%, and the F1 score reached 97.03%.

It can be seen that compared with the ordinary 1DCNN, the MS1DCNN we designed shows good improvement in various standards because the multi-scale convolution can make up for the problem that it is easy for the one-dimensional convolutional neural network to cause some features to be missed when extracting features. After adding the CNN-based channel attention mechanism, the accuracy rate of the model improved further, reaching 92.05%, and the accuracy rate also improved. The classification accuracy rate of the single BiLSTM model in intrusion detection was only 90.99%. After feature extraction and enhancement of the input data through MS1DCNN-ECA, the accuracy rate of the model reached 95.50%, and the precision rate reached 95.70%. The recall rate was 95.50%, and the F1 score was 95.40%. After optimizing the hyperparameters of our model through DBO, the model’s performance was improved further, which also shows the importance of parameter optimization for model performance.

In addition, we also compared it with common machine learning classification models, including random forest, SVM, decision tree, and logistic regression. The results are shown in Table 5. Our model is superior to other machine learning methods, proving our proposed model’s superiority.

Table 5. Results of machine learning model.

Figure 8 shows the false negative rate of all methods. It can be seen that MS1DCNN-ECA-BiLSTM optimized with DBO has a very low false negative rate compared with the comparative model. This is because when processing input data, our proposed hybrid neural network model can learn the correlation between features, and MS1DCNN and BiLSTM can capture the spatial and temporal features of input data at the same time. The results also prove that our proposed model can identify the attack traffic mixed with normal network traffic.

Figure 8. FNR comparison chart.

We selected a set of results to be compared with those without data filling processing, as shown in Table 6 and Table 7. It can be seen that in the experiment without SMOTE, all of the indicators of detection attack B are very low, and the accuracy rate is only 0.0455, which is due to the small number of detection attack A, which leads to the inability to thoroughly learn the characteristics of this category. After using SMOTE, the accuracy and retrieval of detection attack B were greatly improved. Similarly, the indicators of detection attack A also significantly improved. This shows that SMOTE greatly improves the detection ability of the model for categories with few samples. The model added with SMOTE slightly reduces the accuracy and recall of normal traffic, but it improves the overall performance, especially for a few types of attacks, so we can say that SMOTE is effective.

Table 6. Results of each category with SMOTE.

Table 7. Results of each category without SMOTE.

To further verify the detection ability of our model, we conducted a traditional binary classification experiment. At this time, we set the label of normal traffic to 0, with a total of 57,693 pieces of data, and set the label of all attack traffic to 1, with a total of 34,579 pieces of data. The binary classification experiment no longer used SMOTE to supplement a few samples. The ratio of training samples to testing samples was 8:2. Figure 9 shows the confusion matrix for the binary classification experiment.

Figure 9. Binary confusion matrix.

In the binary classification experiment, as shown in Table 8, the test sample has a total of 18,455 pieces of data. The accuracy rate of the model reached 99.30%, the precision rate was 99.30%, the recall rate was 99.30%, the false negative rate was 0.7%, and the F1 score was 99.30%. From the results of the binary classification, we can see that our model also had a very impressive effect on the binary classification, which proves the practicability of the model. Compared with other methods, this model has a higher score in the binary classification experiment.

Table 8. Results of binary classification.

5. Conclusions

This paper has proposed a new intrusion detection model. By using MS1DCNN to extract features from network traffic, and using ECA to assign weights to each feature, the weight of the important features increased. The features were then predicted and classified using a two-layer BiLSTM. At the same time, this paper used the DBO method to optimize the hyperparameters of the model and improve the performance of the model through optimization. In the multi-classification experiment, the model’s accuracy was 97.04%, the precision was 97.17%, the F1-score was 97.03%, and the false negative rate was 2.95%. In the binary classification experiment, the accuracy and false negative rate were 99.30% and 0.7%. The experimental results show that our model has dramatically improved accuracy, precision, recall, and F1 scores. At the same time, the method proposed in this paper can deal well with unbalanced samples. After the verification of several evaluation indexes and comparison with other methods, it was proven that this method can realize effective industrial control system intrusion detection.

This work has a major limitation. At present, the model only supports the detection of known attacks. In the future, we will consider adding the detection ability of unknown attacks and model updating ability to cope with the changing real industrial network.

Author Contributions

Conceptualization, methodology, validation, and writing, K.J.; conceptualization, data curation, and formal analysis, D.S. and Y.Z.; supervision and funding acquisition, K.J., L.Z. and X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

Funded by Shijiazhuang Key R&D Plan (No.211130164A).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhao, X.; Zhang, L.; Cao, Y.; Jin, K.; Hou, Y. Anomaly Detection Approach in Industrial Control Systems Based on Measurement Data. Information 2022, 13, 450. [Google Scholar] [CrossRef]
Ngambeki, I.; Spafford, E.; Ansari, S.; Alhasan, I.; Basil-Camino, M.; Rapp, D. Creating a concept map for ICS security—A Delphi study. In Proceedings of the 2021 IEEE Frontiers in Education Conference (FIE), Lincoln, NE, USA, 13–16 October 2021; pp. 1–7. [Google Scholar]
Wang, J.; Cao, S.J.; Yu, C.W. Development trend and challenges of sustainable urban design in the digital age. Indoor Built Environ. 2021, 30, 3–6. [Google Scholar] [CrossRef]
Ghobakhloo, M.; Fathi, M.; Iranmanesh, M. Industry 4.0 ten years on: A bibliometric and systematic review of concepts, sustainability value drivers, and success determinants. J. Clean. Prod. 2021, 302, 127052. [Google Scholar] [CrossRef]
Abosata, N.; Al-Rubaye, S.; Inalhan, G.; Emmanouilidis, C. Internet of Things for System Integrity: A Comprehensive Survey on Security, Attacks and Countermeasures for Industrial Applications. Sensors 2021, 21, 3654. [Google Scholar] [CrossRef] [PubMed]
Ahmadian, M.M.; Shajari, M.; Shafiee, M.A. Industrial control system security taxonomic framework with application to a comprehensive incidents survey. Int. J. Crit. Infrastruct. Prot. 2020, 29, 100356. [Google Scholar] [CrossRef]
Khan, M.A. HCRNNIDS: Hybrid Convolutional Recurrent Neural Network-Based Network Intrusion Detection System. Processes 2021, 9, 834. [Google Scholar] [CrossRef]
Lee, E.; Rupapara, V.; Rustam, F.; Ashraf, I. Malicious traffic detection in iot and local networks using stacked ensemble classifier. Comput. Mater. Contin. 2022, 71, 489–515. [Google Scholar]
Rustam, F.; Mushtaq, M.F.; Hamza, A.; Farooq, M.S.; Jurcut, A.D.; Ashraf, I. Denial of Service Attack Classification Using Machine Learning with Multi-Features. Electronics 2022, 11, 3817. [Google Scholar] [CrossRef]
Thakkar, A.; Lohiya, R. Fusion of statistical importance for feature selection in Deep Neural Network-based Intrusion Detection System. Inf. Fusion 2023, 90, 353–363. [Google Scholar] [CrossRef]
Kasongo, S.M. A deep learning technique for intrusion detection system using a Recurrent Neural Networks based framework. Comput. Commun. 2023, 199, 113–125. [Google Scholar] [CrossRef]
Mushtaq, E.; Zameer, A.; Umer, M.; Abbasi, A.A. A two-stage intrusion detection system with auto-encoder and LSTMs. Appl. Soft Comput. 2022, 121, 108768. [Google Scholar] [CrossRef]
Rustam, F.; Raza, A.; Ashraf, I.; Jurcut, A.D. Deep ensemble-based efficient framework for network attack detection. In Proceedings of the 2023 21st Mediterranean Communication and Computer Networking Conference (MedComNet), Island of Ponza, Italy, 13–15 June 2023; IEEE: New York, NY, USA, 2023; pp. 1–10. [Google Scholar]
Ling, J.; Zhu, Z.; Luo, Y.; Wang, H. An intrusion detection method for industrial control systems based on bidirectional simple recurrent unit. Comput. Electr. Eng. 2021, 91, 107049. [Google Scholar] [CrossRef]
Hnamte, V.; Hussain, J. DCNNBiLSTM: An efficient hybrid deep learning-based intrusion detection system. Telemat. Inform. Rep. 2023, 10, 100053. [Google Scholar] [CrossRef]
Cao, B.; Li, C.; Song, Y.; Qin, Y.; Chen, C. Network Intrusion Detection Model Based on CNN and GRU. Appl. Sci. 2022, 12, 4184. [Google Scholar] [CrossRef]
Wahid, A.; Breslin, J.G.; Intizar, M.A. Prediction of Machine Failure in Industry 4.0: A Hybrid CNN-LSTM Framework. Appl. Sci. 2022, 12, 4221. [Google Scholar] [CrossRef]
Luo, N.; Li, Y.; Yang, B.; Liu, B.; Dai, Q. Prediction Model for Tea Polyphenol Content with Deep Features Extracted Using 1D and 2D Convolutional Neural Network. Agriculture 2022, 12, 1299. [Google Scholar] [CrossRef]
Israel, I.M.; Israel, S.A.; Irvine, J.M. Factors Influencing CNN Performance. In Proceedings of the 2021 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, 12–14 October 2021; IEEE: New York, NY, USA, 2021; Volume 10, pp. 1–4. [Google Scholar]
Li, S.; Yang, J.; Wu, F.; Rashed, G.I. Combined prediction of photovoltaic power based on sparrow search algorithm optimized convolution long and short-term memory hybrid neural network. Electronics 2022, 11, 1654. [Google Scholar] [CrossRef]
Jogin, M.; Mohana, M.S.; Divya, G.D.; Meghana, R.K.; Apoorva, S. Feature Extraction using Convolution Neural Networks (CNN) and Deep Learning. In Proceedings of the 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), Bangalore, India, 18–19 May 2018; IEEE: New York, NY, USA, 2018; Volume 5, pp. 2319–2323. [Google Scholar]
Kiranyaz, S.; Avci, O.; Abdeljaber, O.; Ince, T.; Gabbouj, M.; Inman, D.J. 1D convolutional neural networks and applications: A survey. Mech. Syst. Signal Process. 2021, 151, 107398. [Google Scholar] [CrossRef]
Smagulova, K.; James, A.P. Overview of long short-term memory neural networks. In Deep Learning Classifiers with Memristive Networks: Theory and Applications; Springer: Berlin/Heidelberg, Germany, 2020; pp. 139–153. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–22 June 2020; pp. 11534–11542. [Google Scholar]
Xue, J.; Shen, B. Dung beetle optimizer: A new meta-heuristic algorithm for global optimization. J. Supercomput. 2023, 79, 7305–7336. [Google Scholar] [CrossRef]
Wu, Y.; He, K. Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Fernández, A.; Garcia, S.; Herrera, F.; Chawla, N.V. SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary. J. Artif. Intell. Res. 2018, 61, 863–905. [Google Scholar] [CrossRef]

Figure 1. Typical 1DCNN architecture.

Figure 2. LSTM cell structure.

Figure 3. BiLSTM learning flow chart.

Figure 4. ECA structure.

Figure 5. Model structure diagram.

Figure 6. Parameter optimization process diagram.

Figure 7. DBO experimental results chart.

Figure 8. FNR comparison chart.

Figure 9. Binary confusion matrix.

Table 1. Comparison of related methods.

Reference	Method	Dataset	Score	Limitation
[8]	PCA, EBF	UNSW-NB15, IoTID20	98.4%	It mainly studies the importance of features, but does not pay attention to the correlation between features.
[9]	PCA, SVD	DoS dataset	100%	It only detects DDoS attacks, not multiple attacks.
[10]	DNN	NSL-KDD, UNSW_NB-15, CIC-IDS 2017	99.84%, 89.03%, 99.80%	It designs many feature selection functions, but does not pay attention to the learning ability of the model to learn features.
[11]	XGboost, LSTM, GRU, sample RNN	NSL-KDD, UNSW-NB15	86.93%, 78.40%	LSTM, GRU, and simple RNN have different performances in different tasks. It does not consider a synthesis method that performs well in multiple tasks.
[12]	AE, LSTM	NSL-KDD	89%	Without considering the two-way variant of LSTM, it cannot learn the time correlation of features from the front and back simultaneously.
[13]	BiSRU	gas pipeline dataset	96.23%	It does not perform well in some minority samples.
[14]	DCNNBiLSTM	CICIDS2018, Edge_IIoT	100%, 99.64%	This mixed model has a large number of parameters, and the author does not consider parameter optimization.

Table 2. Dataset description.

Name	Category	Amount	Label
normal traffic	normal	57,693	0
spoofing attack	abnormal	10,008	1
detection attack A	abnormal	4427	2
detection attack B	abnormal	512	3
socket attack	abnormal	19,632	4
DDoS	abnormal	10,000	5

Table 3. Parameter settings and results of optimization.

Parameter	Lower Bounds	Upper Bounds	Optimal Position
Dropout rate	0.1	0.8	0.5
Learning rate	0.0001	0.1	0.001
Hidden layer parameter 1	30	300	272
Hidden layer parameter 2	30	300	256

Table 4. The experimental results of each method.

Method	Accuracy	Precision	Recall	F1 Score
1DCNN	0.8759	0.8845	0.8759	0.8713
MS1DCNN	0.9001	0.9058	0.9001	0.8859
BiLSTM	0.9099	0.9084	0.9100	0.9071
MS1DCNN-ECA	0.9205	0.9208	0.9205	0.9215
MS1DCNN-ECA-BiLSTM	0.9550	0.9570	0.9550	0.9540
DBO-MS1DCNN-ECA-BiLSTM	0.9704	0.9717	0.9705	0.9703

Table 5. Results of machine learning model.

Method	Accuracy	Precision	Recall	F1 Score
Decision Tree	0.9099	0.9084	0.9100	0.9071
SVM	0.9001	0.9058	0.9001	0.8859
Random Forest	0.8759	0.8845	0.8759	0.8713
Logistic Regression	0.9205	0.9208	0.9205	0.9215
DBO-MS1DCNN-ECA-BiLSTM	0.9704	0.9717	0.9705	0.9703

Table 6. Results of each category with SMOTE.

Method	Accuracy	Precision	Recall	F1 Score
normal traffic	0.9904	0.9976	0.9904	0.9940
spoofing attack	0.9990	0.9857	0.9990	0.9923
detection attack A	0.7980	0.9475	0.7981	0.8664
detection attack B	0.9517	0.8830	0.9517	0.9161
socket attack	0.9786	0.8802	0.9786	0.9268
DDOS	1.0000	0.9990	1.0000	0.9995

Table 7. Results of each category without SMOTE.

Method	Accuracy	Precision	Recall	F1 Score
normal traffic	0.9978	0.9861	0.9978	0.9919
spoofing attack	0.8954	0.9961	0.8954	0.9431
detection attack A	0.6363	0.6958	0.6363	0.6647
detection attack B	0.0455	0.4167	0.0455	0.0820
socket attack	0.9813	0.8700	0.9813	0.9223
DDOS	1.0000	0.9995	1.0000	0.9997

Table 8. Results of binary classification.

Type	Accuracy	Precision	Recall	F1 Score	FNR
Our Model	0.9930	0.9930	0.9930	0.9930	0.0070
SVM	0.9389	0.9270	0.9389	0.9270	0.0611
Decision Tree	0.9543	0.9444	0.9543	0.9439	0.0457
Logistic Regression	0.9542	0.9437	0.9542	0.9433	0.0458
Random Forest	0.9514	0.9498	0.9514	0.9397	0.0486

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

A Network Traffic Intrusion Detection Method for Industrial Control Systems Based on Deep Learning

Abstract

1. Introduction

1.1. Background

1.2. Related Work

1.3. Main Contributions

2. Materials and Methods

2.1. Applicability of the Method

2.2. One-Dimensional Convolutional Neural Networks

2.3. Bidirectional Long Short-Term Memory

2.4. Efficient Channel Attention Module

2.5. Dung Beetle Optimizer

2.6. Data Preprocessing Method

3. The Proposed Methodology

3.1. Algorithm Overview

3.2. Model Structure

3.3. Parameter Optimization

4. Experiments and Results

4.1. Experimental Environment

4.2. Evaluation Standard

4.3. Experimental Datasets and Preprocessing

4.4. Experimental Results and Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics