Python TensorFlow Big Data Analysis for the Security of Korean Nuclear Power Plants

: The Republic of Korea also su ﬀ ered direct and indirect damages from the Fukushima nuclear accident in Japan and realized the signiﬁcance of security due to the cyber-threat to the Republic of Korea Hydro and Nuclear Power Co., Ltd. With such matters in mind, this study sought to suggest a measure for improving security in the nuclear power plant. Based on overseas cyber-attack cases and attacking scenario on the control facility of the nuclear power plant, the study designed and proposed a nuclear power plant control network tra ﬃ c analysis system that satisﬁes the security requirements and in-depth defense strategy. To enhance the security of the nuclear power plant, the study collected data such as internet provided to the control facilities, network tra ﬃ c of intranet, and security equipment events and compared and veriﬁed them with machine learning analysis. After measuring the accuracy and time, the study proposed the most suitable analysis algorithm for the power plant in order to realize power plant security that facilitates real-time detection and response in the event of a cyber-attack. In this paper, we learned how to apply data for multiple servers and apply various security information as data in the security application using logs, and match with regard to application of character data such as ﬁle names. We improved by applying gender, and we converted to continuous data by resetting based on the risk of non-continuous data, and two optimization algorithms were applied to solve the problem of overﬁtting. Therefore, we think that there will be a contribution in the connection experiment of the data decision part and the optimization algorithm to learn the security data. Contributions: Conceptualization, S.L. and Y.K.; Data curation, J.-H.H.; Formal analysis, S.L. and J.-H.H.; Fundingacquisition, J.-H.H.; Investigation, S.L., J.-H.H.andY.K.; Methodology, S.L.andY.K.; Projectadministration, J.-H.H.; Resources, S.L., J.-H.H. and Y.K.; Software, S.L., J.-H.H. and Y.K.; Supervision, S.L., J.-H.H. and Y.K.; Validation, S.L.andY.K.; Visualization, S.L.andY.K.; Writing—originaldraft, S.L., J.-H.H.andY.K.; Writing—review and editing, J.-H.H.


Introduction
Recently, there has been speculation in the Republic of Korea on the permanent shutdown of Shin-Gori Korean Nuclear Power Plant Units 5 and 6. The Korean government organized a speculation committee with the participation of citizens and announced the final recommendation. As a result, the restart of construction of nuclear power plant was recommended. The background of the initial speculation was to reflect the concern of the people regarding the safety of nuclear power.
In line with the severity of radiation leak of nuclear power plants, a decision was made with regard to denuclear awareness. Additional opinions in the speculation survey suggest the increasing need for the supplementation of safety. As such, the safety of a nuclear power plant is a serious national concern [1,2].
Among information system hackings for the last couple of years worldwide, a mobile service-based cyber-attack that poses a national risk is cyber-terrorism on an infrastructure facility such as nuclear

Related Work
A multicore platform established on a large-scale computing cluster has led to promising development in the field of intelligent big data analysis recently, but processing an overwhelming volume of complex and delicate data generated by corporate or public IT centers and institutions is never enough [5]. During the period 2016~2017, MIT Media Lab Professor Dev Roy and his team engaged in research with six fact-checking organizations including PolitiFact and factcheck.org and checked 126,000 news articles that had been classified as true or fake news, concluding that the latter tended to spread faster and wider. His research work was published in the journal Science [6]. For this research, an AI was used to collect the activity data of three million readers who had read and shared such news for the analysis of the rate of news dissemination and number of shares on the network. As a result, the fake news was shared more broadly (70% pt.) than the true news and at much higher speed. This suggests that fake news spread much more widely than real news.
The professor's research team reported that the analysis result from the physical statistics gathered for the comments attached to the news also supported the conclusion that sensational news can be spread rapidly regardless of whether it is true or not, adding, "Fake news with new and exciting features can be easily transmitted on the Internet and SNS" [6]. These big data analysis results can be visualized by using each fragment of the collected unstructured data as a piece of a jigsaw puzzle that would eventually reveal the whole picture once every one of them has been matched [5,6].
As for Big Data analysis, a regression method is generally used using variance. In case of log analysis, however, it is difficult to determine related variables, possibly caused by not following the Gaussian distribution [7][8][9]. Still, the criteria for determining the similarity and dissimilarity of character strings cannot be grounds for determining whether it is an invasion log or not. In particular, as for log related to invasion events, if the intention of the invaders is found, most of them are used to select other methods [10][11][12].
Nonetheless, this study assumed that the analysis using payload would detect whether the important asset route is overlapped or if a similar type of invasion is made. In particular, in the case of periodic log such as Spoofing, it seems analyzable by learning similar types. As for the analysis data, learning was carried out based on data relevant to invasion events. In addition, the study suggested a stable algorithm for the detection accuracy of invasion events and on overshooting and the small learning rate that often take place in learning [13][14][15][16][17]. Python is an advanced programming language released in 1991 by programmer Guido van Rossum. As an independent platform, interpreter-type, object-oriented, and dynamically typing interactive language [18][19][20][21][22], it can be used in various platforms, and it has a rich library (module). It is widely used in universities, various educational institutions, research institutions, and industries as an open source. In particular, the rich library related to big data analysis is a great help in big data analysis, machine learning, and graphic and academic research. The development version used in this study was Python 3 Version 3.6.1 [23].
Meanwhile, Chakraborty's study [24] consider the case of defending enterprises that have been successfully hacked by imposing additional a posteriori costs on the attacker. Mercorio's study [25] proposes a framework, namely discovery information using community detection (DICO), for identifying overlapped communities of authors from Big Scholarly Data by modeling authors' interactions through a novel graph-based data model combining jointly document metadata with semantic information.
As an open source software library for machine learning used in Google products, TensorFlow was created by the Google Brain Team for research and product development. It was released as Apache 2.0 open source license on 9 November 2015. The programming languages are Python and C++, and it can be run not only on mobile environments like Android and iOS but also on 64-bit Linux, MacOS desktop, and multiple CPUs and GPUs in server systems.
The TensorFlow operation adopts the data flow graph method. The data flow graph is expressed as a direction graph expressing a mathematical calculation and the data flow using Node and Edge. Here, the node carries out operations such as mathematical calculation, data input/output, and data reading/storing. Edge represents the input/output relationship of data between nodes. The version used in this study is a graphic tool of Python TensorFlow Version 1.1.0 and Python TensorBoard [26].
Meanwhile, in the AlphaGo [27], they used to input features that was 11th to each pixel, as in the following (Table 1). We might be thinking about using this method if we had so many features in the analysis data.

Python TensorFlow Big Data Analysis for the Security of Korean Nuclear Power Plants
The big data analyzed in this study were collected from 00:02, 28 July 2017 to 23:30, 1 September 2017. The dataset was composed of the Republic of Korea Hydro Nuclear APT equipment log, Electronics 2020, 9, 1467 4 of 19 virus vaccine equipment log, IPS log, output security device log, and DRM log. It consists of "Time", "Equipment Name", "Operation Status", and "Payload". A big data analysis was carried out using Python. The learning data were 5638 lines including 274 leaked data. A total of 2000 data were used as learning data including 96 leaked data. The remaining 3638 data were measured as verification targets.
The function formula is y = Wx + b, and it applies a 2000 × 2 matrix using 2 labels and 2000 learning data in y and a 2000 × 696 matrix using a vector with learning data of 2000 dimensional vectors and 696 data in x. The W value was then estimated, and whether the 178 information-leaked data belonging to 3638 could be accurately found was reviewed using the W value and 3638 lines and labels used in the test. This study compared the algorithm that minimizes cross entropy. Figure 1 shows the numbers converted into a decimal, and they had no meaning. It is important in the order and overlap in the character string analysis. Despite the difficulty in confirming the usage interval using some other characters, it showed the tendency of redundancy of some characters. There are many direct connections in the data order in the figure as well as many data redundancies.
Electronics 2020, 9, x FOR PEER REVIEW 4 of 20 vaccine equipment log, IPS log, output security device log, and DRM log. It consists of "Time", "Equipment Name", "Operation Status", and "Payload". A big data analysis was carried out using Python. The learning data were 5638 lines including 274 leaked data. A total of 2000 data were used as learning data including 96 leaked data. The remaining 3638 data were measured as verification targets.
The function formula is y = Wx + b, and it applies a 2000 × 2 matrix using 2 labels and 2000 learning data in y and a 2000 × 696 matrix using a vector with learning data of 2000 dimensional vectors and 696 data in x. The W value was then estimated, and whether the 178 information-leaked data belonging to 3638 could be accurately found was reviewed using the W value and 3638 lines and labels used in the test. This study compared the algorithm that minimizes cross entropy. Figure 1 shows the numbers converted into a decimal, and they had no meaning. It is important in the order and overlap in the character string analysis. Despite the difficulty in confirming the usage interval using some other characters, it showed the tendency of redundancy of some characters. There are many direct connections in the data order in the figure as well as many data redundancies. Example of converting the security log to the number of characters: the vertical axis is the number of words, the horizontal axis is the word dictionary; the word dictionary is created for words appearing in the current data.

Define Data Features
Since considering data learning, we needed to define the characteristics of the data. Table 2 is related to this. In the existing Alphago, it was 19 × 19 × 48, but we applied the server instead of the pixel and insert log values classified as information leakage, printout security, ExploitMalware, Local Worm Detected, SniperIPS, and Virusspware. Therefore, if there are 6 servers, it becomes 1 × 6, and additionally, the characteristics corresponding to each log are inserted as shown in Table 2: it was masked to IP, FLAG, etc. Example of converting the security log to the number of characters: the vertical axis is the number of words, the horizontal axis is the word dictionary; the word dictionary is created for words appearing in the current data.

Define Data Features
Since considering data learning, we needed to define the characteristics of the data. Table 2 is related to this. In the existing Alphago, it was 19 × 19 × 48, but we applied the server instead of the pixel and insert log values classified as information leakage, printout security, ExploitMalware, Local Worm Detected, SniperIPS, and Virusspware. Therefore, if there are 6 servers, it becomes 1 × 6, and additionally, the characteristics corresponding to each log are inserted as shown in Table 2: it was masked to IP, FLAG, etc. Table 2. Example of log dataset: in the log of ExploitMalware, some charaters is Korean, that means are "ᄆ ᅦᄋ ᅵ ᆫᄒ ᅩ ᆷᄑ ᅦᄋ ᅵᄌ ᅵ": main site in the homepage, "ᄌ ᅮ ᆼᄀ ᅡ ᆫ": middle, "ᄐ ᅡ ᆷᄌ ᅵ":detection and "ᄋ ᅦᄅ ᅥᄑ ᅦᄋ ᅵᄌ ᅵ ᄏ ᅳ ᆯ ᄅ ᅩᄏ ᅵ ᆼ": cloaked error page.
Therefore, as shown in Table 3, the singular value for each server was set based on the characteristic value. Table 3. Applied features data.

Information leakage 23
Value of each item without a date (more detail is included in the patent, which has been excluded)  Unlike other machine learning frameworks, it is embedded, and it provides a graph format. It shows how the data in the computed graph change as the data flows over the graph. Figure 2 depicts the composite multiplication of the values of two variables, X and Y.

Data Learning
It means that the basic function expression of the experiment above is y = ax + b. Variable means a value; Variable 1 refers to the b value, yellow Placeholder indicates the input x value, and final Softmax is an estimated Y value provided from TensorFlow. In an existing neural network, the Sigmoid Electronics 2020, 9, 1467 6 of 19 function was used. Softmax of TensorFlow estimated the Y value according to a probability value based on Sigmoid.
Unlike other machine learning frameworks, it is embedded, and it provides a graph format. It shows how the data in the computed graph change as the data flows over the graph. Figure 2 depicts the composite multiplication of the values of two variables, X and Y.
It means that the basic function expression of the experiment above is y = ax + b. Variable means a value; Variable 1 refers to the b value, yellow Placeholder indicates the input x value, and final Softmax is an estimated Y value provided from TensorFlow. In an existing neural network, the Sigmoid function was used. Softmax of TensorFlow estimated the Y value according to a probability value based on Sigmoid.   Figure 3 shows the structure for measuring accuracy by comparing the validation data y_ after learning with the value of y calculated by learning, where ArgMax_1 is the y_ value, i.e., the prior result value. The final two values were compared, and accuracy was measured as an average. Here, the input value was a structure for sequentially inputting many values applying the matrix structure as it stands.
Electronics 2020, 9, x FOR PEER REVIEW 7 of 20 Figure 3 shows the structure for measuring accuracy by comparing the validation data y_ after learning with the value of y calculated by learning, where ArgMax_1 is the y_ value, i.e., the prior result value. The final two values were compared, and accuracy was measured as an average. Here, the input value was a structure for sequentially inputting many values applying the matrix structure as it stands.     Figure 4 shows the structure for obtaining the minimum value of entropy using Softmax, which calculated the output value with probability instead of applying it to Sigmoid. Gradient Descent (G-Descent) means the gradient descent method.    The general gradient method can determine the minimum value through the derivative of the estimated curve. The method used here was a gradient descent method that limits the travel distance by using the value of the learning rate. This is because the algorithm that prevents a return is mainly used, and it neither makes a big jump nor goes beyond a local trough. The estimate was determined by the minimum mean sum of such learning. Figure 5 presents the entire flow at once and shows the accuracy based on storage with the estimate and input Y_ value. Thus, it was designed with the two structures above.
Electronics 2020, 9, x FOR PEER REVIEW 8 of 20 The general gradient method can determine the minimum value through the derivative of the estimated curve. The method used here was a gradient descent method that limits the travel distance by using the value of the learning rate. This is because the algorithm that prevents a return is mainly used, and it neither makes a big jump nor goes beyond a local trough. The estimate was determined by the minimum mean sum of such learning. Figure 5 presents the entire flow at once and shows the accuracy based on storage with the estimate and input Y_ value. Thus, it was designed with the two structures above. While Adam is known to be a quick time algorithm, the G-Descent algorithm is known to be highly accurate. Thus, the study measured the results by inputting actual big data.  Figures 5 and 6 had the same structure, with the minimization algorithm as the only difference. While Adam is known to be a quick time algorithm, the G-Descent algorithm is known to be highly accurate. Thus, the study measured the results by inputting actual big data.  While Adam is known to be a quick time algorithm, the G-Descent algorithm is known to be highly accurate. Thus, the study measured the results by inputting actual big data.   Figure 7 shows the structure of the convolutional neural network (CNN). Unlike the existing single network, three hidden layers were added. From the top left, the first hidden layer was further arranged with 32 of 5 × 5 using 5 × 5 × 1 × 32. In the second one, the first node was divided with 5 × 5 × 32 × 64. The third one was a fully connected layer with 7 × 7 × 64 × 1024.
Electronics 2020, 9, x FOR PEER REVIEW 9 of 20 Figure 7 shows the structure of the convolutional neural network (CNN). Unlike the existing single network, three hidden layers were added. From the top left, the first hidden layer was further arranged with 32 of 5 × 5 using 5 × 5 × 1 × 32. In the second one, the first node was divided with 5 × 5 × 32 × 64. The third one was a fully connected layer with 7 × 7 × 64 × 1024. The first hidden layer in Figure 8 was generated by decomposing a total of 32 small figures of 5 × 5 from the 28 × 28 picture as shown in the second figure. The first hidden layer in Figure 8 was generated by decomposing a total of 32 small figures of 5 × 5 from the 28 × 28 picture as shown in the second figure. The first hidden layer in Figure 8 was generated by decomposing a total of 32 small figures of 5 × 5 from the 28 × 28 picture as shown in the second figure. It means the generation of a node. Compound multiplying was done to enhance the accuracy of probability by cutting some parts additionally. It means to divide [1][2][3][4] by [1,2], [3,4], where the first extension is a 5 × 5 size and 28 pixels are moved to the left by 1 column to make 24.
In this study, dropout was established for multiple node calculation and deletion of unnecessary nodes, but all connections were used by setting 1.0 mainly to avoid overfitting. It means the generation of a node. Compound multiplying was done to enhance the accuracy of probability by cutting some parts additionally. It means to divide [1][2][3][4] by [1,2], [3,4], where the first extension is a 5 × 5 size and 28 pixels are moved to the left by 1 column to make 24.
In this study, dropout was established for multiple node calculation and deletion of unnecessary nodes, but all connections were used by setting 1.0 mainly to avoid overfitting.

Analysis of Experiment Results and Performance Evaluation
Evaluation in general machine learning is directly related to overshooting and a small learning rate problem, accuracy, and time depending on how the learning rate is adjusted during learning. In the first place, when the G-Descent algorithm is used, a problem occurs wherein the user determines the learning situation.
Overshooting refers to a phenomenon wherein the value of W is minimized, subsequently going beyond the lowest point and proceeding to the relative value and going to infinity. Figure 9 shows that overshooting occurred even in learning less than 10 times when the learning rate was set to 0.015. Overshooting occurred up to 0.00015. In this experiment, the learning rate was set to 0.000015 for stable progress, but learning took longer.

Analysis of Experiment Results and Performance Evaluation
Evaluation in general machine learning is directly related to overshooting and a small learning rate problem, accuracy, and time depending on how the learning rate is adjusted during learning. In the first place, when the G-Descent algorithm is used, a problem occurs wherein the user determines the learning situation.
Overshooting refers to a phenomenon wherein the value of W is minimized, subsequently going beyond the lowest point and proceeding to the relative value and going to infinity. Figure 9 shows that overshooting occurred even in learning less than 10 times when the learning rate was set to 0.015. Overshooting occurred up to 0.00015. In this experiment, the learning rate was set to 0.000015 for stable progress, but learning took longer.      Figure 10 shows a problem wherein the learning rate was set to 0.0000015. Despite the 1000 times of learning, it failed to reach the lowest point due to the short progress distance and ended with 99.17%.   Figure 11 shows that there was no big difference in overall accuracy, but there was about 50% time difference for the same 1000 times of learning. The current TensorFlow was based on CPU, which took more time. In the case of CNN, both accuracy and time decreased. Since it calculated all output values of the multilayer network, the time increase was huge, but accuracy depended on the reflection of connection of all layers. When the layer connection increased, accuracy increased, but time decreased.
Electronics 2020, 9, x FOR PEER REVIEW 11 of 20 Figure 11 shows that there was no big difference in overall accuracy, but there was about 50% time difference for the same 1000 times of learning. The current TensorFlow was based on CPU, which took more time. In the case of CNN, both accuracy and time decreased. Since it calculated all output values of the multilayer network, the time increase was huge, but accuracy depended on the reflection of connection of all layers. When the layer connection increased, accuracy increased, but time decreased.     15,000 events were tested. As a result, it took some time, but the accuracy was high. This means that 13 cases were determined as different events out of 15,000 cases, calculated as (1 − 0.999133) × 15,000 = 13. By learning the other invasion factors with noise, the actual data analysis can be more accurate. 76 events, and 26 events classified into Fireeye_EX Hangul, Multiple Exploit Malware Types, Possible Local Worm Detected, and sniperIPS_low_Log, respectively. Out of the 110,000 cases classified as Possible Local Worm Detected, 15,000 events were tested. As a result, it took some time, but the accuracy was high. This means that 13 cases were determined as different events out of 15,000 cases, calculated as (1 − 0.999133) × 15,000 = 13. By learning the other invasion factors with noise, the actual data analysis can be more accurate.  In Figure 14, accuracy and time were measured by applying the Adam algorithm used in the calculation of the minimum value supported by TensorFlow and the commonly used G-Descent in learning a single class. In addition, it explained the accuracy and time in the case of learning by adding four additional noise classes to single class detection. In Figure 14, accuracy and time were measured by applying the Adam algorithm used in the calculation of the minimum value supported by TensorFlow and the commonly used G-Descent in learning a single class. In addition, it explained the accuracy and time in the case of learning by adding four additional noise classes to single class detection. In Figure 14, accuracy and time were measured by applying the Adam algorithm used in the calculation of the minimum value supported by TensorFlow and the commonly used G-Descent in learning a single class. In addition, it explained the accuracy and time in the case of learning by adding four additional noise classes to single class detection. Meanwhile, Table 4 presents the result of the experiment with three algorithms of machine learning. In the 1st experiment, the Adam algorithm analysis was the fastest with 15.86 /s, and accuracy was the same. The CNN algorithm showed far lower performance in terms of time or accuracy. Meanwhile, Table 4 presents the result of the experiment with three algorithms of machine learning. In the 1st experiment, the Adam algorithm analysis was the fastest with 15.86 /s, and accuracy was the same. The CNN algorithm showed far lower performance in terms of time or accuracy. In the first experiment, the accuracy was the same. Thus, in the second experiment, when learning was done by inserting noise, the accuracy of the G-Descent algorithm was excellent. In other words, G-Descent was excellent in terms of accuracy, and Adam was excellent in terms of time. Learning with noise insertion can lead to more accurate results in actual data analysis. The time measurement result was 0.68 /s, and the Adam algorithm was excellent. Accuracy was not different in single learning, and G-Descent was 0.0003 as a result of noise learning. Moreover, G-Descent is inconvenient as it needs additional settings such as the learning rate.
As a result of the comparison between two machine learning algorithms, Adam was more advantageous than G-Descent as the algorithm of the big data analysis of a nuclear power plant.
Then, the question of whether an algorithm with quick analysis speed is suitable for the log analysis of power plant arises. If the result of the comparison analysis of accuracy and analysis speed is selected, the power plant cannot give up one of the two because, as mentioned in the introduction, damages caused by cyber-attacks on critical infrastructure of the country can mean a national disaster. Table 5 shows the test result comparison with three algorithms of machine learning. Based on the results of the previous experiments, two models were created. Experiments were conducted using four methods. To acquire accuracy and analysis speed, two algorithms were combined. To obtain the optimal algorithm, it was divided into two. First, it carried out a method of increasing speed with the Adam algorithm after increasing accuracy with the prerequisite learning of the G-Descent algorithm (G-Descent ⇒ Adam). Second, it carried out a method of increasing accuracy with the G-Descent algorithm after increasing speed with the Adam algorithm (Adam ⇒ G-Descent). Both methods were experimented on for 500 times each. Adam algorithm ⇒ G-Descent algorithm 500:500 Figure 15 is the result of the scenario 1 experiment. Specifically, it is a result of single learning (500 times) of the Adam algorithm after the learning of the G-Descent algorithm (500 times). As a result of a total of 1000 times of learning, accuracy was 0.999175, and measurement time was 25.32931 /s.              Figure 19 is a visualization graph provided by TensorBoard, which is a visualization tool of TensorFlow. As TensorFlow teaches the neural network, it shows the process of optimization and graphs the indicators. The vertical means the number of machine learning times, with the horizontal     Figure 19 is a visualization graph provided by TensorBoard, which is a visualization tool of TensorFlow. As TensorFlow teaches the neural network, it shows the process of optimization and graphs the indicators. The vertical means the number of machine learning times, with the horizontal  Figure 19 is a visualization graph provided by TensorBoard, which is a visualization tool of TensorFlow. As TensorFlow teaches the neural network, it shows the process of optimization and graphs the indicators. The vertical means the number of machine learning times, with the horizontal line as the values of weight. This corresponds to a when y = ax + b, and If you do not include the noise value in the experiment, the slopes can be divided into two higher distributions. A and B in Figure 17 mean that 1,000 times of learning were done in G-Descent ⇒ Adam order, and that 1000 times of execution were each carried out. The top two lines and the two bottom lines indicate that the weights distribution value depended on which one was first studied. Figure 19B on the top row shows the weight distribution that seems to have not been trained despite the 500:500 learning curve, while the bottom line D shows the evenly distributed slope. As a result, learning Adam first rather than G-Descent was found to be advantageous in class classification.
Electronics 2020, 9, x FOR PEER REVIEW 16 of 20 line as the values of weight. This corresponds to a when y = ax + b, and If you do not include the noise value in the experiment, the slopes can be divided into two higher distributions. A and B in Figure   17 mean that 1,000 times of learning were done in G-Descent ⇒ Adam order, and that 1,000 times of execution were each carried out. The top two lines and the two bottom lines indicate that the weights distribution value depended on which one was first studied. Figure 19B on the top row shows the weight distribution that seems to have not been trained despite the 500:500 learning curve, while the bottom line D shows the evenly distributed slope. As a result, learning Adam first rather than G-Descent was found to be advantageous in class classification.  Figure 20 is a graph comparing the accuracy of single learning and noise insertion learning. The vertical line indicates accuracy, and the width indicates the learning unit. The vertical scale indicates the accuracy of a single cell with 100 times of learning, the blurred line denotes the accuracy when verifying with learning data, and the darker line indicates the accuracy with validation data after learning with learning data. The reason the diagonal line (dotted line) appeared is that the noise was caused by the phenomenon wherein accuracy suddenly dropped again while double learning was processed. Since it is a result of the experiment, it is expressed as is. The figure above is a curved graph with winding in case of single learning (C, D), and the noise learning (A and B) shows a curved graph without winding, showing more stability.
The currently experimented learning data covered 2000-3000 cases. The graph was not shown in detail. Experimenting with at least 100,000 cases may produce a more detailed graph.  indicates the accuracy of a single cell with 100 times of learning, the blurred line denotes the accuracy when verifying with learning data, and the darker line indicates the accuracy with validation data after learning with learning data. The reason the diagonal line (dotted line) appeared is that the noise was caused by the phenomenon wherein accuracy suddenly dropped again while double learning was processed. Since it is a result of the experiment, it is expressed as is. The figure above is a curved graph with winding in case of single learning ( Figure 20C,D), and the noise learning ( Figure 20A,B) shows a curved graph without winding, showing more stability.  Table 6 shows the final results by the four scenario methods. As for single learning, the method of enhancing accuracy with the G-Descent algorithm after enhancing speed with the Adam algorithm was faster at 10.86856 /s. Accuracy was higher when G-Descent was learnt first. The noise learning results show that G-Descent was faster with the same accuracy.  The currently experimented learning data covered 2000-3000 cases. The graph was not shown in detail. Experimenting with at least 100,000 cases may produce a more detailed graph. Table 6 shows the final results by the four scenario methods. As for single learning, the method of enhancing accuracy with the G-Descent algorithm after enhancing speed with the Adam algorithm was faster at 10.86856 /s. Accuracy was higher when G-Descent was learnt first. The noise learning results show that G-Descent was faster with the same accuracy. Meanwhile, Figure 21 compares the learning time from single learning and from noise insertion. Inserting noise can increase accuracy but learning takes longer. Although not mentioned in this experiment, when an experiment at a learning ratio of 900:100 was tested additionally, the speed of Adam ⇒ G-Descent was 15.0898 /s, and that of G-Descent ⇒ Adam was 15.4410 /s. Accuracy had differences in almost three decimal places.
Meanwhile, Figure 21 compares the learning time from single learning and from noise insertion. Inserting noise can increase accuracy but learning takes longer. Although not mentioned in this experiment, when an experiment at a learning ratio of 900 : 100 was tested additionally, the speed of Adam ⇒ G-Descent was 15.0898 /s, and that of G-Descent ⇒ Adam was 15.4410 /s. Accuracy had differences in almost three decimal places. The test result suggests that Adam ⇒ G-Descent was suitable as a recommended machine learning algorithm for the big data log analysis to enhance the security of the nuclear power plant.

Conclusions
This study designed and proposed a nuclear power plant control network traffic analysis system that satisfies the security requirements and in-depth defense strategy based on recent overseas cyberattack cases and control facility attacking scenario against nuclear power plants.
To enhance the security of the nuclear power plant, the study collected big data such as the internet provided to the control facilities, network traffic of the intranet, and security equipment events and compared and verified them with the machine learning analysis. After measuring accuracy and time, the study proposed the most suitable analysis algorithm for the power plant through comparison and analysis. To find a suitable algorithm, the study compared and tested Adam, G-Descent, and CNN algorithm. In the 1st test result, in terms of the analysis study, the Adam algorithm was predominant, and G-Descent had the highest accuracy. CNN was excluded as it had more than double the speed difference compared with other algorithms. The suitable algorithm for the control network analysis requires not only accuracy but also analysis time. Real-time analysis on the attack against power plants needs to be short and quick but accuracy cannot be compromised.
As a result of the experiment, the Adam algorithm ⇒ G-Descent algorithm method recorded faster analysis speed with 10.86858 /s than the G-Descent algorithm ⇒ Adam algorithm. Likewise, there was no significant difference in accuracy. As the experiment environment was a result of learning using a workstation memory, the speed difference is expected to be greater if a graphic card is used.
In addition to the modified technique of the optimization technique, this study presents a method on how to apply the data for machine learning in relation to security, and at the same time, if it is not continuous data, converts it to continuous data using a specific index. By this, we tried to present a method of avoiding by using two optimization techniques in relation to the possibility of As a result of the experiment, the G-Descent time of the single algorithm was 14.46075 /s, with Adam recording 25.3293 /s. When the problem of selecting either time or accuracy in a single algorithm was tested and measured in four types in a double manner, Adam ⇒ G-Descent recorded the same time with 14.46075 /s, which had no difference with single Adam but had increased accuracy.
The test result suggests that Adam ⇒ G-Descent was suitable as a recommended machine learning algorithm for the big data log analysis to enhance the security of the nuclear power plant.

Conclusions
This study designed and proposed a nuclear power plant control network traffic analysis system that satisfies the security requirements and in-depth defense strategy based on recent overseas cyber-attack cases and control facility attacking scenario against nuclear power plants.
To enhance the security of the nuclear power plant, the study collected big data such as the internet provided to the control facilities, network traffic of the intranet, and security equipment events and compared and verified them with the machine learning analysis. After measuring accuracy and time, the study proposed the most suitable analysis algorithm for the power plant through comparison and analysis. To find a suitable algorithm, the study compared and tested Adam, G-Descent, and CNN algorithm. In the 1st test result, in terms of the analysis study, the Adam algorithm was predominant, and G-Descent had the highest accuracy. CNN was excluded as it had more than double the speed difference compared with other algorithms. The suitable algorithm for the control network analysis requires not only accuracy but also analysis time. Real-time analysis on the attack against power plants needs to be short and quick but accuracy cannot be compromised.
As a result of the experiment, the Adam algorithm ⇒ G-Descent algorithm method recorded faster analysis speed with 10.86858 /s than the G-Descent algorithm ⇒ Adam algorithm. Likewise, there was no significant difference in accuracy. As the experiment environment was a result of learning using a workstation memory, the speed difference is expected to be greater if a graphic card is used.
In addition to the modified technique of the optimization technique, this study presents a method on how to apply the data for machine learning in relation to security, and at the same time, if it is not continuous data, converts it to continuous data using a specific index. By this, we tried to present a method of avoiding by using two optimization techniques in relation to the possibility of learning and the problem of overfitting, especially the problem of learning to use a character. In a future study, we will conduct research on Bagging (Bootstrap) in relation to the lack of data related to intrusion and hacking.