Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Adaptive Industrial Control System Attack Sample Expansion Algorithm Based on Generative Adversarial Network

Appl. Sci. 2022, 12(17), 8889; https://doi.org/10.3390/app12178889

by Yun Sha^*, Zhaoyu Chen, Xuejun Liu, Yong Yan, Chenchen Du, Jiayi Liu and Ranran Han

Reviewer 1:

Nivethitha Somu

Reviewer 2:

Ruiheng Zhang

Appl. Sci. 2022, 12(17), 8889; https://doi.org/10.3390/app12178889

Submission received: 28 July 2022 / Revised: 27 August 2022 / Accepted: 30 August 2022 / Published: 5 September 2022

(This article belongs to the Special Issue AI Applications in the Industrial Technologies)

Round 1

Reviewer 1 Report

I appreciate the authors for taking up this research problem and the novel solution. However, the authors should focus on the following points to enhance the quality of the paper.

1. Clarity of the figures needs to be improved

2. Add some more descriptions on the motivation, and why GANs are especially considered as a solution

3. Experimental section contains a lot more figures than descriptions. Add a separate section for discussing your observations

4. Have you experimented with GAN variants if so present your observations. Else, perform experiments with a few variants of GANs and show the results.

Author Response

Response to Reviewer 1 Comments

Point 1:Clarity of the figures needs to be improved.

Response 1: As shown in the PDF file, the font size of the digital data in the text has been adjusted.

Point 2: Add some more descriptions on the motivation, and why GANs are especially considered as a solution.

Response 2: In the related work, the first paragraph of the time series dataset enhancement algorithm has added the motivation description of selecting GAN. The contents are as follows:

Industrial automation control abnormal dataset is a time-series dataset essentially. Although data types are diverse, the data enhancement methods they use have common design ideas, such as flipping, stretching, oversampling or adding noise. The enhancement effect of these methods is not satisfactory. Until the emergence of GAN in 2014, this situation was changed.[7] GAN was first used in the image field. Through the idea of two person zero sum game, it can generate data that can not be distinguished by humans but can be judged by the model. The generated results are very similar to the original samples, and the generation efficiency is also improved. If this method is applied to the generation of abnormal data of industrial control system, the generated negative samples are more similar to the positive samples. So that the authenticity of the generated samples is improved. After that, many improved methods of GAN began to appear.

Point 3: Experimental section contains a lot more figures than descriptions. Add a separate section for discussing your observations.

Response 3: As shown in the PDF file, in the experimental validation, the observation results of each experiment have been added to the experimental results graph. Specific additions are as follows::

In 4.1.1 oscillation type data generation, an observation has been added in this article, which is as follows: As shown in Fig.4, the fluctuation trend of the data generated by using adagrad and Adam optimization methods is far from the original data; The data generated by rmsprop optimization method is far from the original data in value range; However, the data generated by SGD and momentum optimization methods are more similar to the original data than the other three. Therefore, aiming at the change of oscillation characteristics, this paper chooses momentum as its optimizer.

In 4.1.2 step type data generation, an observation has been added in this article, which is as follows: As shown in Fig.5, except for the step jump in the value range, the overall fluctuation trend of the step type data in the original data is very smooth. Only the data generated by Adam optimization method has the most similar fluctuation trend, and the results of other optimization methods are far from it. Therefore, aiming at the change of step characteristics, this paper improves the Adam optimization method and applies it to the generative countermeasure network.

In 4.1.3 pulse type data generation, an observation has been added in this article, which is as follows: RMSprop failed to generate data due to the disappearance of gradient during training, so this paper doesn't put it into the comparison results. As shown in Fig.6, compared with the data generated by other optimization methods, the fluctuation trend of the data generated by Adam optimization method is most similar to that of the original pulse type data. In view of the change of pulse characteristics, this paper chooses Adam optimization method as its positive example sample generator. It takes the generation result of positive samples as the basic trend of negative samples, and then carries out pulse embedding.

In 4.2.1 oscillation type data generation, an observation has been added in this article, which is as follows: For oscillation type data, the four comparison algorithms use SGD as the optimizer. As shown in Fig.7, compared with DCGAN, bilstm CNN-GAN and timeGAN, which have better effects in image generation and time-series data generation, the algorithm in this paper is more similar to the original data in terms of the change trend and jump position of the generated data in terms of the oscillation type characteristics of the negative sample of the industrial control underlying business data.

In 4.2.2 step type data generation, an observation has been added in this article, which is as follows: For step type data, the Adam optimization method is used as the optimizer to compare GAN and DCGAN in the algorithm; Bilstm-CNN GAN and timeGAN use the improved Adam optimization method as the optimizer. As shown in Fig.8, compared with DCGAN, Bilstm-CNN GAN and timeGAN, which have better effects in image generation and time series data generation, the algorithm in this paper can learn the step change of the original data in terms of the step feature change of the negative sample of the industrial control underlying business data. Our results are more similar to the original data in terms of the change trend and jump position of the generated data.

In 4.2.3 pulse type data generation, an observation has been added in this article, which is as follows:For pulse type data , the four comparison algorithms use SGD as the optimizer.As shown in Fig.9, compared with DCGAN, bilstm-CNN GAN and timeGAN, which have better effects in image generation and time-series data generation, the algorithm in this paper is more similar to the original data in terms of the change trend and jump position of the generated data in terms of the ladder type characteristics of the negative sample of the industrial control underlying business data.

Point 4: Have you experimented with GAN variants if so present your observations. Else, perform experiments with a few variants of GANs and show the results.

Response 4: In 4.2 experimental results of related algorithms, GAN variants have been used as the experimental control group. DCGAN, Bilstm-CNN GAN and timeGAN are all GAN variants, and are compared with the algorithms in this paper in three data change types. The specific contents are shown in the PDF file.

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper provides a method of generating abnormal samples based on GAN for the scarcity of attack samples in the industrial control system. Dealing with three types of attack data, the author proposes three optimizing strateges for each one through selecting specific optimizers of neural network.

Although it gets a fine result in the experiment, this paper still exists kinds of shortcomings.

1.In the end of section 2 (related work), it concludes that the basic GAN is suitable for the work because other improved GANs have inborn drawbacks. But throughout the above, the author only introduces some improved GANs and their breakthroughs. How does this conclusion come from?

2.In line 170 and some other parts, the author repeatedly mentions that in the process of gradient descent, jump-points will be ignroed. However, whether SGD or momentum method, the appearance of jumping data only makes a turning point in the direction of gradient updata which influences the model congervence speed. The meaning of “ignroe” needs a further explanation.

3.In section 3.4, after determining the start-stop of impulse, it inserts randomly generated pulse data into the original curve. Please explain how the impulse is randomly generated and the ralationship between GAN. Moreover, formula 7 is wrongly written.

4. Compared with other GAN models, the experiment is not convincing. Because the method in this paper is using different optimizers for gradient descent. However, the author doesn’t mention the gradient descent strategy adopted by other GAN models.

5. Some deep learning works are missing. [1] Multi-camera multi-player tracking with deep player identification in sports video [2] Deep-irtarget: An automatic target detector in infrared imagery using dual-domain feature extraction and allocation

Author Response

Response to Reviewer 2 Comments

Point 1:In the end of section 2 (related work), it concludes that the basic GAN is suitable for the work because other improved GANs have inborn drawbacks. But throughout the above, the author only introduces some improved GANs and their breakthroughs. How does this conclusion come from?

Response 1: As shown in the PDF file, in the last paragraph of the time series dataset enhancement algorithm of related work, I have added the reason why the basic GAN is suitable for this work. The content is as follows:

The abnormal data set of industrial automation control has the characteristics of few jump points and similar data fluctuation trend before and after the jump points. In the iteration of data expansion based on GAN network, its characteristics will be ignored as data noise. To sum up, most of the existing GAN derived networks are more and more powerful in filtering data noise and better in gradient convergence. They will ignore the jump points of the three types of data of the industrial control underlying business data as noise points when the data iteration network updates parameters. However, the convergence speed of the basic GAN gradient is slow, the noise filtering ability is weak, and the characteristics of data jump can be preserved relatively well. In addition, most of the improved GANs for time-series data generation are only applicable to periodic data and have poor performance in disordered Industrial automation control data. In order to make the noise points not be ignored during data iteration, this paper uses the basic adversarial generation network as the main network of data generation.

Point 2:In line 170 and some other parts, the author repeatedly mentions that in the process of gradient descent, jump-points will be ignroed. However, whether SGD or momentum method, the appearance of jumping data only makes a turning point in the direction of gradient updata which influences the model congervence speed. The meaning of “ignroe” needs a further explanation.

Response 2: As shown in the PDF file, in the first paragraph of Common Optimizer of related work, I have added an explanation for jump points will be ignored. The contents are as follows:

Stochastic gradient descent (SGD) [22] is frequently updated with high variance, its objective function will fluctuate violently. Although volatility allows SGD to jump to the new and potentially better local optima, it complicates the process of eventually converging to a specific minimum. Momentum method improves SGD by adding momentum in simulated physics. This algorithm improves the convergence speed while preserving the volatility of the SDG method. Therefore, the momentum optimizer is more suitable for the characteristics of fluctuations in a certain range of oil depot data, but it is not suitable for data with significant jump at a certain time. Both SGD and momentum perform gradient descent and parameter updating at a certain step in the direction of gradient convergence. And in the process of iteration, they gradually reduce the influence of noise, and finally make the gradient converge. However, the step jump point in the oil depot data will be mistaken as noise data, resulting in the generation result showing a gradual rise or fall curve change, and the step change of the generated data is far from that of the original data. Therefore, the better the gradient descent effect, the harder it is to retain the step effect of the data.

Point 3: In section 3.4, after determining the start-stop of impulse, it inserts randomly generated pulse data into the original curve. Please explain how the impulse is randomly generated and the ralationship between GAN. Moreover, formula 7 is wrongly written.

Response 3: In Section 3.4, the generation of pulse data and its relationship with GAN have been explained. The contents are as follows:

The value of pulse position t and pulse length Δt is converted to percentage. Then, read the pulse position value and generate new pulses. The sample distribution is calculated by the probability formula, and the positions of the high and low binary values in the pulse are arranged according to the sample distribution to generate a new pulse. Finally, according to the location and length of the original data, the pulse is inserted into the basic trend to form a new attack sample. Fig. 3 shows the overall flow chart of the algorithm.

As shown in the PDF file, Formula 7 has been modified.

Point 4: Compared with other GAN models, the experiment is not convincing. Because the method in this paper is using different optimizers for gradient descent. However, the author doesn’t mention the gradient descent strategy adopted by other GAN models.

Response 4: As shown in the PDF file, in 4.2 Experimental results of related algorithms, the optimizer was explained for each experiment. pecific additions are as follows:

Point 5: Some deep learning works are missing. [1] Multi-camera multi-player tracking with deep player identification in sports video [2] Deep-irtarget: An automatic target detector in infrared imagery using dual-domain feature extraction and allocation.

Response 5: The missing deep learning work has been referenced in the introduction. The content is as follows:

At present, artificial intelligence deep learning algorithms are widely used in various fields. For example, convolutional neural networks are commonly used in target tracking and target detection.[24][25]In the field of data expansion, the most efficient and common way is GAN ( Generative Adversarial Network )[7] , but most GAN derivative networks do not apply to industrial control underlying business data attack samples. In 2019, FeiZhu [8] and others proposed the time series data generation model of Bilstm-CNN GAN and published papers on nature. This method is mainly aimed at the time series data generation of ECG samples. ECG data has the characteristics of time sequence, periodicity and low data dimension. However, the step-type and pulse-type data in the underlying business data of industrial control are non-periodic, and the data dimension of the entire dataset is high, and there are many types of data changes. If BiLSTM-CNN GAN or other GANs for temporal data generation are used, step-type and pulse-type data will be treated as noise, and it is difficult to learn attack characteristics.

As shown in the PDF file, In the references, the missing deep learning work is supplemented.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The author of Reference [24] is wrong.

-[24] Rza B, Lw A, Yy A, et al. Multi-camera multi-player tracking with deep player identification in sports video - ScienceDirect 414 Pattern Recognition, 102. 415 25.

Zhang R, Wu L, Yang Y, et al. Multi-camera multi-player tracking with deep player identification in sports video[J]. Pattern Recognition, 2020, 102: 107260.

Author Response

Response to Reviewer 2 Comments

Point 1: The author of Reference [24] is wrong.

Response 1: As shown in the PDF file, reference [24] has been changed.The contents are as follows: [24]Zhang R, Wu L, Yang Y, et al. Multi-camera multi-player tracking with deep player identification in sports video[J]. Pattern Recognition, 2020, 102: 107260.

Author Response File: Author Response.pdf

Article Menu

Adaptive Industrial Control System Attack Sample Expansion Algorithm Based on Generative Adversarial Network

Further Information

Guidelines

MDPI Initiatives

Follow MDPI