Improvement of Cyber-Attack Detection Accuracy from Urban Water Systems Using Extreme Learning Machine

: This study proposes a novel detection model for the detection of cyber-attacks using remote sensing data on water distribution systems (i.e., pipe ﬂow sensor, nodal pressure sensor, tank water level sensor, and programmable logic controllers) by machine learning approaches. The most commonly used and well-known machine learning algorithms (i.e., k-nearest neighbor, support vector machine, artiﬁcial neural network, and extreme learning machine) were compared to determine the one with the best detection performance. After identifying the best algorithm, several improved versions of the algorithm are compared and analyzed according to their characteristics. Their quantitative performances and abilities to correctly classify the state of the urban water system under cyber-attack were measured using various performance indices. Among the algorithms tested, the extreme learning machine (ELM) was found to exhibit the best performance. Moreover, this study not only has identiﬁed excellent algorithm among the compared algorithms but also has considered an improved version of the outstanding algorithm. Furthermore, the comparison was performed using various representative performance indices to quantitatively measure the prediction accuracy and select the most appropriate model. Therefore, this study provides a new perspective on the characteristics of various versions of machine learning algorithms and their application to di ﬀ erent problems, and this study may be referenced as a case study for future cyber-attack detection ﬁelds.


Introduction
Most infrastructure systems (e.g., water distribution systems (WDSs), power grid systems, and telecommunication systems) have been monitoring and controlling their remote sensing data on supervisory control and data acquisition (SCADA) systems and programmable logic controllers (PLCs) to efficiently operate and manage. These supervisory and logical systems can be described as cyber-physical systems (CPSs) that are connected by the cyber network to physical devices. Because a CPS can monitor and control the physical processes of equipment in real-time, it is practical and convenient to make use of such systems in the operation of infrastructure systems, wherein even a minor malfunction can cause serious damage.
In the past few years, the CPSs have been used in several infrastructure systems from power grid systems to telecommunications systems. They have also been used in water utilities, such as in WDSs, machine learning to accurately detect cyber-attacks in the WDS field. However, since the detection performance of cyber-attack differs depending on the kinds of machine learning algorithms used, a study that determines the most suitable machine learning approach for the detection of cyber-attacks through the quantitative comparison of the performances of various machine learning algorithms will be needed.
In summary of the above statements, in the past, the CPSs have been used in several infrastructure systems such as power grid systems and telecommunications systems. However, recent WDSs fields have also used CPSs. For this reason, many researchers developed the operation and management approach of CPSs and cyber-attack detection and prevention approaches, because it can be easy to access in a cyber network.
Therefore, this study proposes a novel detection model coupled with machine learning (ML) approaches for the effective detection of cyber-attacks on WDSs. For this reason, this study applied and compared several types of machine learning approaches (i.e., extreme learning machine (ELM), k-nearest neighbor (KNN) algorithm, Support Vector Machine (SVM), and artificial neural networks (ANNs)) for finding the best ML algorithm. This study not only searches for the best performing algorithm through simple algorithm comparison but also considers the improved version of the basic algorithm to show the best performance. Therefore, this study could become a foundational study in the future development of cyber-attack detection techniques in WDSs fields.
Furthermore, this study used a dataset from the Battle of the Attack Detection Algorithms (BATADAL [31]) for the various cyber-attack scenarios and a comparison was performed using various representative performance indices which measure the prediction accuracy quantitatively and select the most appropriate algorithm.

Security Goals and Cyber-Physical Attacks
The goal of WDSs is to supply sufficient clean water to meet the needs of the population. For this reason, highly complex water systems should be monitored and controlled in a timely manner to achieve high system efficiency and reliable water supply. Recently, water infrastructure systems have been making use of CPSs for automatic and accurate system operation based on SCADA systems, sensors, actuators, and communication systems.
With water infrastructure systems that are more connected to telecommunication systems, they are also more vulnerable to cyber-attacks. Therefore, efforts should be made to improve the detection and identification of cyber-attacks on SCADA systems and physical water systems. Thus, this study proposes an effective detection model based on various machine learning approaches. The BATADAL dataset (http://batadal.net/data.html) was used with the specific details of the attack scenarios demonstrated in the following sections.

Attack Model
The datasets used in this study are the only open-source datasets that consider the cyber-attack conditions of the WDSs. The cyber-attack data introduced in the World Environmental and Water Resources Congress held in Sacramento, California [28] is discussed in detail in the following paragraphs. The data generated on the potential effect of a cyber-attack on C-town's WDSs (see Figure 1) are utilized for the battle of the WDSs [28]. The C-town network consists of 429 pipes, 388 nodes, 7 tanks, 11 pumps, 4 valves, and a reservoir. Because the attacks are conducted through the malicious control of actuators, modification of the PLC control settings, or delivery of erroneous information due to malfunctions in the communication systems, the components' (e.g., pumps, valves and tank) sensors are connected to 9 PLCs and are located near the components. The training input data covered the SCADA values of 43 system components (i.e., the water level of 7 tanks, the status and water flow of 11 pumps, and Appl. Sci. 2020, 10, 8179 4 of 18 the status and water flow of a valve and the pressure of 12 junctions) and the training output data reflected the attack or non-attack conditions.

Attack Scenarios and Specifications
For the BATADAL dataset, 6 attack scenarios were considered. The attack scenarios were limited to hydraulic problems, such as disruption of pump operations, tank overflow, or depletion. The attack scenarios are distributed across the whole of the simulation period, therefore a total of 383 time steps were set as "under-attack" conditions, whereas the other 12,555 data were denoted as "nonattack" conditions. The test dataset had a length of 2089 data with 407 time steps labeled as "underattack" conditions. To apply the attack scenarios from the normal condition, the tank's water level was considered to be 0.25 m (T1 and T2), 0.75 m (T5), 1 m (T3, T4, and T7), 2 m (T6) as an initial condition; this individual simulation was used for showing the characteristic of the effect for cyber-physical attacks [28]. The specifications of each attack scenario are shown in Table 1, and Scenario 1, 2, 3, and 5 are set to perform repeated generation during the simulation period because the malfunction of the tank or pump can be often generated as they are connected to several sensors or actuators.
• Scenario 1: This scenario focuses on a direct attack on the components of WDSs, such as pumps, valves, and tanks. For example, a tank overflow may be caused by a direct attack on a pumping station through the activation of unscheduled pumps. • Scenario 2: This scenario is caused by a disturbance in the reading or transportation of data between the sensors and PLC. For instance, manipulated water level readings can lead to the depletion or low levels of water in the tanks. This problem may be due to the physical manipulation of the water level sensor or the miscommunication of information between the sensor and PLC. • Scenario 3: This scenario relates to the connection among PLCs. If Tank 1 is connected to PLC 1, and the information of Tank 1 (e.g., water level) is intercepted by the attack and is transmitted instead to PLC 2, the other tank connected to PLC 2 will either overflow or be depleted. • Scenario 4: This scenario is an attack to conceal the PLC connection problem (Scenario 3) and disrupt the information between SCADA and PLC. An example would be a situation wherein there is a problem in the identification of attacks due to the malfunction of communication links between the PLC and SCADA.

Attack Scenarios and Specifications
For the BATADAL dataset, 6 attack scenarios were considered. The attack scenarios were limited to hydraulic problems, such as disruption of pump operations, tank overflow, or depletion. The attack scenarios are distributed across the whole of the simulation period, therefore a total of 383 time steps were set as "under-attack" conditions, whereas the other 12,555 data were denoted as "non-attack" conditions. The test dataset had a length of 2089 data with 407 time steps labeled as "under-attack" conditions.
To apply the attack scenarios from the normal condition, the tank's water level was considered to be 0.25 m (T1 and T2), 0.75 m (T5), 1 m (T3, T4, and T7), 2 m (T6) as an initial condition; this individual simulation was used for showing the characteristic of the effect for cyber-physical attacks [28]. The specifications of each attack scenario are shown in Table 1, and Scenario 1, 2, 3, and 5 are set to perform repeated generation during the simulation period because the malfunction of the tank or pump can be often generated as they are connected to several sensors or actuators.
• Scenario 1: This scenario focuses on a direct attack on the components of WDSs, such as pumps, valves, and tanks. For example, a tank overflow may be caused by a direct attack on a pumping station through the activation of unscheduled pumps. • Scenario 2: This scenario is caused by a disturbance in the reading or transportation of data between the sensors and PLC. For instance, manipulated water level readings can lead to the depletion or low levels of water in the tanks. This problem may be due to the physical manipulation of the water level sensor or the miscommunication of information between the sensor and PLC. • Scenario 3: This scenario relates to the connection among PLCs. If Tank 1 is connected to PLC 1, and the information of Tank 1 (e.g., water level) is intercepted by the attack and is transmitted instead to PLC 2, the other tank connected to PLC 2 will either overflow or be depleted.

•
Scenario 4: This scenario is an attack to conceal the PLC connection problem (Scenario 3) and disrupt the information between SCADA and PLC. An example would be a situation wherein there is a problem in the identification of attacks due to the malfunction of communication links between the PLC and SCADA.
• Scenario 5: This scenario focuses on the SCADA data transportation problem. This attack scenario entails the alteration of the packages being sent by SCADA to change the operations of a PLC it supervises. In particular, the communication link between SCADA and PLC is attacked, resulting in pump activation and tank overflow. • Scenario 6: This scenario contains random multiple combinations of attacks on PLCs.

Application and Comparison of Various Classification Methods
The proposed novel detection model is a tool for the effective detection of cyber-attacks on WDSs. To develop detection approaches, this study applied a set of well-known and commonly used machine learning approaches. The applied cyber-attacks scenarios are expected to lead to anomalous hydraulic results in various WDSs components (e.g., the nodal pressure, tank level, and pipe flow), and to a corresponding significant system error.
Therefore, each machine learning approach was trained using historical SCADA data representing normal operating conditions. Then, these approaches were simulated with new SCADA data, and the reconstruction error produced at each time step was monitored. The monitoring data of the WDSs is classified as under attack if the average reconstruction error across all system variables is larger than a user-defined threshold (this study used 0.95 [30]) and is classified as safe otherwise.
The prediction performance of each algorithm is evaluated using the performance indices that have various characteristics and the most suitable algorithm for the detection model was identified by comparing the performances of the top four standard algorithms (i.e., KNN algorithm, SVM, ANNs, and ELM). Then, the classification results of the model using improved versions of the best algorithm were compared and analyzed in detail.

K-Nearest Neighbor Algorithm
The K-Nearest Neighbor (KNN) strategy is one of the simplest and most fundamental of the classification approaches. This method is useful for classification where there is little to no prior information on the distribution of data [32,33]. Table 2 presents the pseudo code of KNN [34]. Table 2. Pseudocode of the k-Nearest neighbor algorithm.
Definition: X: training data, Y: class labels of X, x: unknown sample

Support Vector Machine
Support Vector Machine (SVM) is a linear or non-linear classifier, which is a mathematical function used to distinguish between two different types of objects [35,36]. The SVM can be used for both regression and classification tasks. The basic concept of SVM is that the maximum margin hyperplanes separate the training data to the greatest extent. Figure 2 illustrates a schematic view of SVM.
by comparing the performances of the top four standard algorithms (i.e., KNN algorithm, SVM, ANNs, and ELM). Then, the classification results of the model using improved versions of the best algorithm were compared and analyzed in detail.

K-Nearest Neighbor Algorithm
The K-Nearest Neighbor (KNN) strategy is one of the simplest and most fundamental of the classification approaches. This method is useful for classification where there is little to no prior information on the distribution of data [32,33]. Table 2 presents the pseudo code of KNN [34].

Support Vector Machine
Support Vector Machine (SVM) is a linear or non-linear classifier, which is a mathematical function used to distinguish between two different types of objects [35,36]. The SVM can be used for both regression and classification tasks. The basic concept of SVM is that the maximum margin hyperplanes separate the training data to the greatest extent. Figure 2 illustrates a schematic view of SVM.

Artificial Neural Networks
Standard Artificial Neural Networks (ANNs) are powerful computational models in establishing the relation between variables involving unknown data, particularly for complex non-linear relationships [37,38]. Generally, the ANNs is composed of an input layer which trains the data, hidden layer(s) for computing the weight of the input, and an output layer, where the results of the ANNs are produced. Each layer consists of basic elements called neurons and the neuron is a non-linear algebraic function [39]; since the neurons affect the model performance, determining the number of neurons is important [40,41].

Standard Extreme Learning Machine
A neural network is a set of connected input/output units where each connection has a weight associated with it. The network learns by adjusting the weights and biases iteratively to minimize the approximate mean square error. In supervised learning, a gradient descent-based learning algorithm is a typical back-propagation (BP) learning algorithm, which updates the network weights and biases in the direction (i.e., gradient direction) in which the error function decreases most rapidly [42]. However, Appl. Sci. 2020, 10, 8179 7 of 18 the learning speed of gradient-based algorithms is generally slow, and the selection of a learning rate is also tedious. These algorithms will be unstable if too large a value is chosen and will converge too slowly if the value is too small. Hence, the Extreme Learning Machine (ELM) algorithm was proposed to overcome these issues.
The ELM [43,44] was originally inspired by biological learning and proposed a modification of the single-layer feedforward network (SLFN), wherein weights and biases are chosen randomly to overcome the challenging issues faced by BP learning algorithms. Unlike other so-called randomness (semi-randomness)-based learning methods/networks [45], all the hidden nodes in the ELM are not only independent of the training data but are also independent of each other. Miche et al. [18] established the universal approximation capability of ELM and its capability for biological learning. Based on the proven theory, the input weights and hidden layer biases of SLFNs can be randomly assigned if the activation functions in the hidden layer are infinitely differentiable, and the standard ELM process is given as follows [46]: Given a training set ℵ = {(x i , t i )|x i ∈ R n , t i ∈ R m , i = 1, ..., N}, activation function g(x), and the number of hidden nodes N.
Step 2: Calculate the hidden layer output matrix (H).

Online Sequential ELM
To process the standard ELM approach, the parameters of ELM and the entirety of the training data are required for the training step. Therefore, the standard ELM assumes that all the training data are ready prior to the training process. However, in cases of real-time system operation or maintenance tasks, the training data are constantly increased and accumulated by the passage of time or the occurrence of new events.
For this reason, Liang et al. [47] developed the online sequential ELM (OS-ELM), which can train the sequentially accumulated data one-by-one. This learning approach is an appropriate technique for practical applications in which the number of training data or the property of timeliness is not fixed (i.e., training data have a validity period [48]). For instance, in the short-term prediction of stock prices, the training data that are older and less effective are given lower weighting than the recent data. Table 3 presents the pseudo code of OS-ELM [47]. Table 3. Pseudo code of OS-ELM [47].
Output: A trained ELM model (Standard ELM phase) Let k = 0, Calculate the hidden layer output matrix H 0 using initial training data Estimate the initial output weight β 0 → P 0 = (H 0 T H 0 ) −1 (OS-ELM phase) -When the (k + 1)-th chunk of new data {X k+1 , T k+1 } arrives, update the hidden layer output matrix as H k+1 = [H k T , ∆H k+1 T ] T where ∆H k+1 is the hidden layer output matrix corresponding to the newly arrived data -Update the output weights as β k+1 = β k + P k+1 H k+1

Bidirectional ELM
In the case of the standard ELM, which has a fixed network architecture, the parameter affecting its performance is the number of hidden nodes. However, determining the optimal number of hidden nodes depends on the sensitivity analysis based on a trial and error approach. To solve this problem, incremental ELM (I-ELM) was proposed by Huang et al. [46]. The I-ELM is an improved version of the standard ELM obtained by adding hidden nodes one-by-one until the expected training accuracy is achieved.
Although I-ELM has been used to determine the appropriate number of hidden nodes, the problem of computation time still remains because I-ELM calculates n output weights one by one when n hidden nodes are used. For these reasons, Yang et al. [49] developed the bidirectional ELM (B-ELM), improving upon the standard ELM algorithm by finding an appropriate number of hidden nodes and enhancing the computing speed and accuracy. The basic concept of B-ELM is the optimization of two main parameters (a i , b i ) related to the hidden nodes. a i is the weight vector connecting the input layer to the ith hidden node and b i is the bias of the ith hidden node. The optimization of these two parameters results in the rapid decrease in residual error. The B-ELM is discussed in detail in the pseudo code given in Table 4. Table 4. Pseudo code of B-ELM [49].

Weighted ELM
The ELM is a competitive machine learning technique, which is simple in theory and fast in implementation. The network types are "generalized" single hidden layer feedforward networks, which are quite diversified in terms of the variety of feature mapping functions or kernels. To handle data with imbalanced class distribution, a weighted ELM (W-ELM) that can balance the data was proposed [50]. The proposed W-ELM method maintains the advantages of the standard ELM, such as (i) simplicity and convenience, (ii) wide variety of feature mapping functions or kernels, and (iii) capacity for multiclass classification tasks. Moreover, after applying the weighting scheme, (1) the W-ELM can deal with data with imbalanced class distribution while maintaining good performance with well-balanced data. (2) By assigning different weights for each example according to users' needs, the W-ELM can be generalized to cost-sensitive learning.

Applications, Results, and Discussion
The objective of this study is the development of a novel detection model for the effective detection of cyber-attacks on WDSs by applying various machine learning approaches and determining the machine learning approach with the best detection performance. To evaluate the performance of each approach in the detection of cyber-attacks on WDSs, the commonly used approaches (i.e., KNN, SVM, ANNs, and ELM) are applied to the BATADAL dataset, simulating various cyber-attack scenarios.
Moreover, to compare the quantitative performance of each algorithm, various performance indices are used. Such performance indices are introduced in the next subsection. The learning approaches are first compared in terms of performance, and the algorithm with the best detection performance is determined. Then, the improved versions of the said learning algorithm are tested and analyzed. All computations were performed using MS Excel 2019, and such machine learning techniques as KNN, ANNs, SVM, ELM and Data Manager were developed using MATLAB 2019 (MathWorks, Inc., Natwick, MA, USA).

Performance Indices
The performance indices quantitatively measure the prediction accuracy and select the most appropriate model after adjusting the model parameters or model formulation to reduce errors in prediction. In comparing the prediction performance for the various parameters, the model with the least errors was considered as the most accurate model. Such errors were determined by comparing the observed and predicted data.
However, because various performance measures have different prediction errors (e.g., overall error, special point error, and percentage error), the appropriate performance indices must be selected depending on the characteristics of the applied dataset (binary data or real number data). The dataset used in this study is expressed as a binary code (i.e., normal and abnormal condition). Detection performance should be evaluated based on the ability to detect all attacks without raising false alarms.
Therefore, this study has adopted the classification performance measure approach [51] to correctly classify the state of the WDSs under cyber-attack. The classification performance measure approach makes use of a metric referred to as the sensitivity or true positive rate (TPR), which is defined as the ratio of the number of time steps correctly classified as "under-attack" to the total number of time steps during which the system is under-attack.
The metric is composed of four variables, namely, the number of true positives (TP), number of false positives (FP), number of true negatives (TN) and number of false negatives (FN). These values are then used to calculate: the precision, also known as the positive predictor value (PPV); specificity, also known as the true negative rate (TNR); and recall, also known as TPR. The equations for these variables are given as follows: Positive predictor value (PPV) = TP TP + FP True Positive rate (TPR) = TP TP + FN True Negative rate (TNR) = TN TN + FP (4) To facilitate the comparison among the reported algorithms, this study combined PPV and TPR into the F1 score, which gives the same importance to both metrics. The F1 score is defined as the harmonic average of the two metrics, and is expressed in the following equation: Moreover, the time-to-detection (TTD) value is the difference between the time when an attack starts and when it is first flagged as an under-attack event. It is used to calculate the score of time-to-detection (S TTD ) for effective comparison of all detection models under different attack scenarios. These variables are defined in the following equations: Score of time to detection (S TTD ) = 1 − 1 n a where ∆t is the total duration of the attack, n a is the number of attacks contained in a dataset, TTD i is the TTD relative to the ith attack, and ∆t i is the corresponding duration. The overall performance score S is used to rank the algorithms given as follows: where S CLF is the score of classification performance and is calculated by S CLF = (TPR + TNR)/2, and the weight factor for detection importance of time detection or classification performance is γ (set to 0.5 in this study). All metrics described in this section vary between 0 and 1, with 0 representing the poorest performance and 1 representing perfect accuracy.

Comparison of Prediction Results for the Various Standard Classification Approaches
To determine the algorithm that exhibits the best detection performance, a comparison of the prediction results was conducted for the various original classification approaches dealing with WDS cyber-attack situations. Before the performance comparison, a sensitivity analysis of the parameters of each algorithm (e.g., training function, adaptation learning function, performance function, number of layers, number of neurons, and transfer function) was first performed by comparing the model performances depending on the parameter variation (see Table 5). Table 5. Selected parameters result for the parameter sensitivity analysis of applied machine learning algorithms.

Algorithms
User Parameters To show the outperformed algorithm, this study performed a sensitivity analysis of the parameters of each algorithm. Based on the sensitivity analysis process, we found out appropriate parameters on each algorithm. Therefore, the performance comparison performed fairly. In the sensitivity analysis process, the parameters were changed within the range. The ANNs considered six parameters under each parameter's range (training functions: among 19; adaptation learning functions: among 5; transfer functions: among 3; number of layers: 1~10 (gap 1); number of neurons: 1~50 (gap 5); training probability: 40~80% (gap 10%)).

KNN
The SVM considered two parameters under each parameter's range (mathematical functions: among 4; polynomial order: 1~10% (gap 0.5)), while the ELM considered two parameters under each parameter's range (activation functions: among 5; number of hidden neurons: 10~150 (gap 1)). The Table 5 shows the results for the sensitivity analysis of the parameters of each algorithm. Table 6 represents the performance comparison results for the original versions of the learning algorithms. A visual detection comparison of the normal status and the under-attack conditions is illustrated in the bar charts given in Figure 3. The applied cyber-attack data comprise training/testing input and output data. The output data are configured as binary codes (i.e., normal condition and under-attack condition) based on the various hydraulic results of the system components (i.e., water level of tanks, status and water flow of pumps, status and water flow of valves, and pressure of junctions). It can be observed from Table 6 that an accuracy value according to the four variables (i.e., TP, FP, TN, and FN) can be used to evaluate the algorithms' total detection ability regardless of whether the condition is set as normal or under-attack. The detection accuracy values of all the applied algorithms were over 80%, which is quite high (see Table 6). However, the KNN exhibited a significantly lower TPR than the other algorithms, only detecting cyber-attacks 10 times (TP). For the ANNs, most of the performance indices, except S TTD , were of similar values or less than that of the SVM. Moreover, among the total events (normal conditions: 1682; under-attack conditions: 407), ANNs exhibited higher values of FP and FN than the SVM, amounting to 81 and 218, respectively. The ANNs exhibited a high false detection rate. However, the value of S TTD seen in the ANNs was 0.789, which was higher than that in SVM.
The S TTD is defined as the detection ability at the beginning of the "under-attack" conditions. Although the ANNs performed better in the detection of the initial cyber-attack event in comparison with the SVM, the overall detection ability of ANNs was worse. This shows that it lacks the ability to detect cyber-attacks consistently after the initial cyber-attack. Figure 3b,c illustrate the difference in the detection ability of the ANNs and SVM through visual detection comparison. In Figure 3, the red bars represent the prediction data for under-attack conditions and the black bars indicate the observed data. In the detection history of ANNs (Figure 3b), most of the red bars are located in the initial event corresponding to each attack scenario except two cases (i.e., 2nd and 5th attack scenarios). However, the SVM detected four initial events among seven scenarios, and subsequently, most of the under-attack events were predicted constantly.
Based on the comparison among the machine learning algorithms, the performance of ELM is outstanding in all aspects, particularly in detection accuracy and initial attack event detection. As depicted in Figure 3d, the ELM detected all the cyber-attack scenarios and even detected most of the initial attack events in the seven scenarios. However, some of the attack events were not detected and there were some false detections around the attack events. This proves that although the ELM was able to consistently and accurately detect most of the attacks, it failed to do so consistently. Appl. Sci. 2020, 10, x FOR PEER REVIEW 13 of 19

Comparison of Prediction Results of Improved Versions of ELM
As represented in Section 4.2, the ELM was identified to be the best machine learning approach. This section compares the improved versions of the ELM algorithm, namely, the online sequential ELM, bidirectional ELM, and weighted ELM, with respect to the prediction quality and performance measures. The results are analyzed according to their algorithm's characteristics. Because the performance of ELM affects the values of its parameters (i.e., activation function and number of hidden neurons), to evaluate the performance, the validation process of such parameters is necessary [52].
In response to this, the sensitivity analysis of the parameters was conducted for the improved version of ELM algorithms. Based on the parameter sensitivity analysis, the applied activation function and number of hidden neurons of each algorithm are as follows: OS-ELM: triangular basis (Tribas) function and 10; B-ELM: sine function, and 10; W-ELM: sigmoid function and 75. The cyber-attack prediction results are presented in Table 7 and Figure 4.  Table 7 highlights the three improved versions of ELM and the standard ELM, wherein all variants, including the standard ELM, achieved a ranking score S higher than (or close to) 0.9. The results of OS-ELM show the best overall performance (S = 0.910). The OS-ELM has the top score with respect to the classification score S CLF . Moreover, the detection history depicted in Figure 4a shows that all attacks are immediately detected, with the exception of the first attack scenario, which is detected a few hours after its starting time. However, in comparison with the other improved algorithms, false alarms are more likely to occur before an attack occurs. This is because the OS-ELM was developed considering real-time system operation or maintenance tasks, wherein the training data are accumulated over time. Therefore, it can be shown that the time difference results in false prediction as most of the false detections are generated intensively before an attack occurs. The performance of B-ELM is close to that of the standard ELM and W-ELM with respect to the S metric and identifying TTD. However, it is more likely to generate false alarms. Unlike other variants of ELM, B-ELM increases the user convenience in terms of determining the appropriate number of hidden nodes while providing accurate prediction results without the need to conduct sensitivity analysis for the number of hidden neurons.
The W-ELM has a similar ranking score S to that of OS-ELM and B-ELM. The S CLF value of W-ELM is lower than that of the others, resulting in a lower TPR and higher FN. All of these lead to a score of S equal to 0.906. Moreover, the W-ELM algorithm is shown to have the least timing error, as presented in Table 7. This implies that although almost all the starting attack points can be detected using the W-ELM, it is also prone to FPs. This may be due to the weight of data in the training process. The weight of abnormal cases is affected by the cyber-attack detection. For this reason, most of the false predictions for the W-ELM occur close to the attack conditions. ELM is lower than that of the others, resulting in a lower TPR and higher FN. All of these lead to a score of S equal to 0.906. Moreover, the W-ELM algorithm is shown to have the least timing error, as presented in Table 7. This implies that although almost all the starting attack points can be detected using the W-ELM, it is also prone to FPs. This may be due to the weight of data in the training process. The weight of abnormal cases is affected by the cyber-attack detection. For this reason, most of the false predictions for the W-ELM occur close to the attack conditions.

Conclusions and Future Studies
This study proposes a novel detection model, which uses machine learning approaches, for the detection of cyber-attacks on WDSs. To propose a novel cyber-attack detection model in WDSs, this study performed two types of analyses. First, the machine learning approach with the most suitable detection performance was identified and analyzed. For this analysis, the most commonly used and well-known machine learning algorithms (i.e., KNN, SVM, ANNs, and ELM) were compared to determine the one with the best detection performance. Second, by considering the improved version of the outstanding algorithm among the compared algorithms, the characteristics of the improved algorithms, in terms of the cyber-attack detection problem, were analyzed. Moreover, various performance indices were used to compare the quantitative performance of each algorithm and correctly classify the state of the WDSs under cyber-attack situations.
According to the performance analysis of the improved versions of ELM in this study, the three ELM algorithms have an outstanding performance compared with the other machine learning approaches (e.g., KNN, ANNs, and SVM) in the aspect of the score S, which reflects overall performance. However, through the applied problem in this study and the characteristics of each ELM algorithm, this study can derive several limitations. The initial version of ELM was developed to increase process speed, accuracy and to solve large real-world problems. Therefore, according to the simulation results of this study, the performance of ELMs was outperformed, however, compared to the other algorithms applied in this study, the ratio of the false positive detection was relatively high.
The trend of false positive detection differs depending on the algorithm's characteristics, such as when the false predictions occur before the attack (OS-ELM), or when most of the false predictions occur near the attack conditions (W-ELM). One practical solution for solving this problem may be using some heuristic pre-knowledge information from the urban water systems and hydraulic system to make the classifier more intelligent. This helps the classifier to use some of its pre-knowledge to accurately predict the right condition. In addition, utilizing fuzzy logic (fuzzy theory) along with the ELM makes decision making much easier.
Similarly, the various ELMs proposed have several limitations in terms of algorithm improvement (e.g., reducing false detection and improving true detection probability) and application aspects (e.g., image training problems, abnormal condition detection problems, time series problems, and regression problems). Although this study has the above limitations, it analyzed the characteristics of various types of ELM algorithms and the application problems of the various ELMs; this research serves as a good case study of ELMs to extend their applications to different and various fields.
Moreover, the generated prediction results and analyses are expected to benefit the development of new ELM approaches, considering the thorough evaluation of each algorithm. Future studies may refer to the conclusions of this study to develop highly accurate algorithms. In doing so, one must not only consider benchmark problems but various other types of problems as well, such as (i) training data configuration: binary, real number, integer, and mix-integer; (ii) data collection type: sequencing data, prepared whole data; and (iii) data characteristics: continuous data (i.e., time series) versus discrete data. Besides, other improved/hybrid versions of ELM, particularly convolutional neural networks used for feature learning along with ELM as a classifier, may enhance the accuracy of the obtained results.