Electrical Power Prediction through a Combination of Multilayer Perceptron with Water Cycle Ant Lion and Satin Bowerbird Searching Optimizers

Predicting the electrical power (PE) output is a significant step toward the sustainable development of combined cycle power plants. Due to the effect of several parameters on the simulation of PE, utilizing a robust method is of high importance. Hence, in this study, a potent metaheuristic strategy, namely, the water cycle algorithm (WCA), is employed to solve this issue. First, a nonlinear neural network framework is formed to link the PE with influential parameters. Then, the network is optimized by the WCA algorithm. A publicly available dataset is used to feed the hybrid model. Since the WCA is a population-based technique, its sensitivity to the population size is assessed by a trial-and-error effort to attain the most suitable configuration. The results in the training phase showed that the proposed WCA can find an optimal solution for capturing the relationship between the PE and influential factors with less than 1% error. Likewise, examining the test results revealed that this model can forecast the PE with high accuracy. Moreover, a comparison with two powerful benchmark techniques, namely, ant lion optimization and a satin bowerbird optimizer, pointed to the WCA as a more accurate technique for the sustainable design of the intended system. Lastly, two potential predictive formulas, based on the most efficient WCAs, are extracted and presented.

In machine learning, ANNs have been widely used for analyzing diverse energyrelated parameters in power plants [113][114][115]. Akdemir [116], for example, suggested the use of ANNs for predicting the hourly power of combined gas and steam turbine power plants. Regarding the coefficient of determination (R 2 ) of nearly 0.97, the products of the ANN were found to be in great agreement with real data. The successful use of two machine learning models, namely, recurrent ANN and a neuro-fuzzy system, was reported by Bandić et al. [117], who applied three popular machine learning approaches, namely, random forest, random tree, and an adaptive neuro-fuzzy inference system (ANFIS), to the same problem. Their findings indicated that the random forest outperforms other models. They also took a feature selection measure. It was shown that the original and changed data led to root mean square errors (RMSEs) of 3.0271 and 3.0527 MW, respectively. Mohammed et al. [118] used an ANFIS to find the thermal efficiency and optimal power output of combined cycle gas turbines which were 61% and 1540 MW, respectively.
Metaheuristic techniques have effectively assisted engineers and scholars in optimizing diverse problems [23,[119][120][121][122][123][124][125][126][127][128], especially energy-related parameters such as solar energy [129], building thermal load [130], wind turbine interconnections [131], and green computing awareness [132]. Seyedmahmoudian et al. [133] used a differential evolution and particle swarm optimization (DEPSO) method to analyze the output power for a building-integrated photovoltaic system. These algorithms have also gained a lot of attention for optimally supervising conventional predictors like ANNs. Hu et al. [134] proposed a sophisticated hybrid composed of an ANN with a genetic algorithm (GA) and the PSO for predicting short-term electric load. With a relative error of 0.77%, this model performed better than the GA-ANN and PSO-ANN. Another application of the GA was studied by Lorencin et al. [135]. They tuned an ANN to estimate the P E output of a CCPP. Since the proposed model achieved a noticeably smaller error than a typical ANN, it was concluded that the GA is a nice optimizer for this system. Ghosh et al. [136] used a metaheuristic algorithm called beetle antennae search (BAS) to exploit a cascade feed-forward neural network applied to simulate the P E output of a CCPP. Due to the suitable performance of the developed model, they introduced it as an effective method for P E analysis. Chatterjee et al. [137] combined the ANN with cuckoo search (CS) and the PSO for electrical energy modeling at a combined cycle gas turbine. Their findings showed the superiority of the CS-trained ANN (with an average RMSE of approximately 2.6%) over the conventional ANN and PSO-trained version.
Due to the crucial role of power generation forecast in the sustainability of systems like gas turbines [138], selecting an appropriate predictive model is of great importance. On the other hand, the above literature reflects the high potential of metaheuristic algorithms for supervising the ANN. However, a significant gap in the knowledge emerges when the literature of P E analysis relies mostly on the first generation of these techniques (e.g., PSO and GA). Hence, this study is concerned with the application of a novel metaheuristic technique, namely, the water cycle algorithm (WCA) for the accurate prediction of the P E of a base load operated CCPP. Moreover, the performance of this algorithm is comparatively validated by ant lion optimization (ALO) and satin bowerbird optimizer (SBO) as benchmarks. These techniques are applied to this problem through a neural network framework. Some previous studies have shown the competency of the WCA [139], ALO [140], and SBO [141] in optimizing intelligent models like ANNs and ANFIS. The main contribution of these algorithms to the P E estimation lies in finding the optimal relationship between this parameter and influential factors.

Data Provision
When it comes to intelligent learning, the models acquire knowledge by mining the data. In ANN-based models, this knowledge draws on a group of tunable weights, as well as biases. The data should represent records of one (or a number of) input parameter(s) and their corresponding target(s).
In this work, the data are downloaded from a publicly available repository at: http://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant, based on studies by Tüfekci [138] and Kaya et al. [142]. The 6 years of records (2006-2011) of a CCPP working with full load (nominal generating capacity of 480 MW, made up of 2 × 160 MW ABB 13E2 gas turbines, 2× dual pressure heat recovery steam generators, and 1 × 160 MW ABB steam turbine) form this dataset [138]. It gives full load electrical power output as the target parameter, along with four input parameters, namely, ambient temperature (AT), exhaust steam pressure (vacuum, V), atmospheric pressure (AP), and relative humidity (RH). Figure 1 shows the relationship between the P E and input parameters. According to the drawn trendlines, a meaningful correlation can be seen in the figures of P E -AT and P E -V (R 2 of 0.8989 and 0.7565, respectively), while the values of AP and RH do not indicate an explicit correlation. Both AT and V are adversely proportional to the P E .

Methodology
The overall methodology used in this study is shown in Figure 2.

The WCA
Simulating the water cycle process was the main idea of the WCA algorithm, which was designed by Eskandar et al. [143]. In studies like [144], scholars have used this algorithm for sustainable energy issues. When the algorithm gets started, a population with the size N pop is generated from raindrops. Among the individuals, the best one is designated as the sea whose solution is shown by X sea . Additionally, individuals with promising solutions (X r s) are considered as rivers. The number of rivers is determined based on the parameter N sr that gives the number of rivers plus the unique sea. The residual individuals form the stream group (X s s). The number of streams is the difference between N pop and N sr .

Methodology
The overall methodology used in this study is shown in Figure 2. The description of the used algorithms is presented below.

The WCA
Simulating the water cycle process was the main idea of the WCA algorithm, which was designed by Eskandar, et al. [143]. In studies like [144], scholars have used this algorithm for sustainable energy issues. When the algorithm gets started, a population with the size Npop is generated from raindrops. Among the individuals, the best one is designated as the sea whose solution is shown by Xsea. Additionally, individuals with promising solutions (Xrs) are considered as rivers. The number of rivers is determined based on the parameter Nsr that gives the number of rivers plus the unique sea. The residual individuals form the stream group (Xss). The number of streams is the difference between Npop and Nsr.
The population can be expressed as follows: Concerning the function value of Xr and Xsea in the beginning, a number of Xs are designated to each Xr and Xsea based on the following relationship: The description of the used algorithms is presented below. The population can be expressed as follows: Concerning the function value of X r and X sea in the beginning, a number of X s are designated to each X r and X sea based on the following relationship: in which f stands for the function value and n = X sea , X 1 r , ..., X N sr −1 r . Despite the typical procedure in nature (stream → river → sea), some streams may flow straight to the sea. The new values of X r and X s are obtained from the below equations: where rand is a random number (in [0, 1]), cons gives a positive constant value (in [1,2]), t signifies the iteration number. X r and X s are evaluated and compared. If the quality of X s is better than that of X r , they exchange their positions. A similar process happens between X r and X sea [145,146]. By performing the evaporation part of the water cycle, the algorithm is again implemented to improve the solution iteratively.

The Benchmarks
The first benchmark algorithm is the ALO. Mirjalili [147] designed this algorithm as a robust nature-inspired strategy. Additionally, it has attracted the attention of experts for tasks like load shifting in analyzing sustainable renewable resources [148]. The pivotal idea of this algorithm is simulating the idealized hunting actions of the antlion. They build a cone-shaped fosse and wait for prey (often ants) to fall into the trap. The prey makes some movements to escape from antlions. The fitness of the solution is evaluated by a roulette wheel selection function. In this sense, the more powerful the hunter is, the better the prey is [149]. The details of the ALO and its application for optimizing intelligent models like ANNs can be found in earlier literature [150].
The SBO is considered as the second benchmark for the WCA. Inspired by the lifestyle of satin bowerbirds, Moosavi and Bardsiri [141] developed the SBO. Scholars like Zhang et al. [151] and Chintam and Daniel [152] have confirmed the successful performance of this algorithm in dealing with structural and energy-related optimization issues. In this strategy, there is a bower-making competition between male birds to attract a mate. The population is randomly created and the fitness of each bower is calculated. By making an elitism decision, the most promising individual is considered as the best solution. After determining the changes in the positions, a mutation operation is applied, followed by a step to combine the solutions of the old and new (updated) population [153]. A mathematical description of the SBO can be found in studies like [154].

Accuracy Assessment Measures
Two essential error criteria, namely, the RMSE and mean absolute error (MAE), are defined to return different forms of the prediction error. Another error indicator called mean absolute percentage error (MAPE) is also defined to report the relative (percentage) error. Given P E i expected and P E i predicted as the expected and predicted electrical power outputs, Equations (7) to (9) denote the calculation of these indicators.
where the number of samples (i.e., 7654 and 1914 in the training and testing groups, respectively) is signified by N. Moreover, a correlation indicator called the Pearson correlation coefficient (R) is used. According to Equation (10), it reports the consistency between P E expected and P E predicted . Note that the ideal value for this indicator is 1.

Hybridizing and Training
It was earlier stated that this study pursues a novel forecasting method for the problem of P E modeling. To this end, the water cycle algorithm explores the relationship between this parameter and four inputs through an MLP neural network. This skeleton is used to establish nonlinear equations between the mentioned parameters. A three-layer MLP is considered wherein the number of neurons lying in the first, second, and third layer (also known as input, hidden, and output layers) equals four (the number of inputs), nine (obtained by trial and error practice), and one (the number of outputs only), respectively. Figure 3 shows this structure: Sustainability 2020, 12, x; doi: FOR PEER REVIEW www.mdpi.com/journal/sustainability where the number of samples (i.e., 7654 and 1914 in the training and testing groups, respectively) is signified by N. Moreover, a correlation indicator called the Pearson correlation coefficient (R) is used. According to Equation 10, it reports the consistency between and . Note that the ideal value for this indicator is 1.

Hybridizing and Training
It was earlier stated that this study pursues a novel forecasting method for the problem of PE modeling. To this end, the water cycle algorithm explores the relationship between this parameter and four inputs through an MLP neural network. This skeleton is used to establish nonlinear equations between the mentioned parameters. A three-layer MLP is considered wherein the number of neurons lying in the first, second, and third layer (also known as input, hidden, and output layers) equals four (the number of inputs), nine (obtained by trial and error practice), and one (the number of outputs only), respectively. Figure 3 shows this structure: There are two kinds of tunable computational parameters in an MLP: (a) weights (W) that are designated to each input factor and (b) bias terms. Equation 11 shows the calculation of a neuron with a given input (I).
where Tansig signifies an activation function which is defined as follows: There are two kinds of tunable computational parameters in an MLP: (a) weights (W) that are designated to each input factor and (b) bias terms. Equation (11) shows the calculation of a neuron with a given input (I).
where Tansig signifies an activation function which is defined as follows: Each neuron of the ANN applies an activation functions to a linear combination of inputs and network parameters (i.e., W and b) to give its specific response. There are a number of functions (e.g., Logsig, Purelin, etc.) that can be used for this purpose. However, many studies have stated the superiority of Tansig for hidden neurons [155][156][157].
The WCA finds the optimal values of the parameters in Equation (11) in an iterative procedure. In this way, the suitability of each response (in each iteration) is reported by an objective function (OF). This study uses the RMSE of training data for this purpose. So, the lower the OF is, the better the optimization is. Figure 4a shows the optimization curves of the WCA for the given problem. The reduction of the OF in this figure shows that the RMSE error is being reduced consecutively. The convergence curves are plotted for seven different WCA-NN networks distinguished by different population sizes (PS of 10, 50, 100, 200, 300, 400, 500). As is seen, the curve of PS = 400 is finally below the others. Therefore, this network is the representative of the WCA-NN for further evaluations. Note that a total of 1000 iterations were considered for all tested PSs. The same strategy (i.e., the same PSs and number of iterations) was executed for the benchmark models. It was shown that ALO-NN and SBO-NN with PSs of 400 and 300 are superior. Figure 4 (b) depicts and compares the convergence behavior of the selected networks. According to this figure, all three algorithms have a similar performance in dealing with error minimization. The OF is chiefly reduced over the initial iterations.    (Table  1), these values indicate a very good prediction for all models. Moreover, the calculated MAPEs report less than 1% relative errors (0.7076%, 0.7359%, and 0.7289%).
Moreover, the R values of 0.96985, 0.96807, and 0.96834 profess an excellent correlation between the products of the used models and the observed PE. This favorable performance means that the WCA, ALO, and SBO have nicely understood the dependence of the PE on AT, V, AP, and RH and, accordingly, they have optimally tuned the parameters of the MLP system.

Testing Performance
The testing ability of a forecasting model illustrates the generalizability of the captured knowledge for unfamiliar conditions. The weights and bias terms tuned by the WCA, ALO, and SBO created three separate methods that predicted the PE for testing samples. The quality of the results is assessed in this section. Figure 6 presents two charts for each model. First, the correlation between the and is graphically shown. Along with it, the frequency of errors ( − ) is shown in the form of histogram charts. At a glance, the results of all three models demonstrate promising generalizability, due to the aggregation of points around the ideal line (i.e., x = y) in Figures 6 (a), (c), and (e). Additionally, as a general trend in Figures 6 (b)  Famously, the size of the population can greatly impact the quality of optimization. The convergence curves are plotted for seven different WCA-NN networks distinguished by different population sizes (PS of 10, 50, 100, 200, 300, 400, 500). As is seen, the curve of PS = 400 is finally below the others. Therefore, this network is the representative of the WCA-NN for further evaluations. Note that a total of 1000 iterations were considered for all tested PSs.
The same strategy (i.e., the same PSs and number of iterations) was executed for the benchmark models. It was shown that ALO-NN and SBO-NN with PSs of 400 and 300 are superior. Figure 4b depicts and compares the convergence behavior of the selected networks. According to this figure, all three algorithms have a similar performance in dealing with error minimization. The OF is chiefly reduced over the initial iterations. Figure 4b also says that the OF of the WCA-NN is below both benchmarks. In this sense, the RMSEs of 4.1468, 4.2656, and 4.2484 are calculated for the WCA-NN, ALO-NN, and SBO-NN, respectively. Additionally, the corresponding MAEs (3.2112, 3.3389, and 3.3075) can support this claim.
Subtracting P E predicted from P E expected returns an error value for each sample.  (Table 1), these values indicate a very good prediction for all models. Moreover, the calculated MAPEs report less than 1% relative errors (0.7076%, 0.7359%, and 0.7289%). Moreover, the R values of 0.96985, 0.96807, and 0.96834 profess an excellent correlation between the products of the used models and the observed P E . This favorable performance means that the WCA, ALO, and SBO have nicely understood the dependence of the P E on AT, V, AP, and RH and, accordingly, they have optimally tuned the parameters of the MLP system.

Testing Performance
The testing ability of a forecasting model illustrates the generalizability of the captured knowledge for unfamiliar conditions. The weights and bias terms tuned by the WCA, ALO, and SBO created three separate methods that predicted the P E for testing samples. The quality of the results is assessed in this section. Figure 6 presents two charts for each model. First, the correlation between the P E expected and P E predicted is graphically shown. Along with it, the frequency of errors (P E expected − P E predicted ) is shown in the form of histogram charts. At a glance, the results of all three models demonstrate promising generalizability, due to the aggregation of points around the ideal line (i.e., x = y) in Figure 6a,c,e. Additionally, as a general trend in Figure 6b,d,f, small errors (zero and close-to-zero ranges) have a higher frequency compared to large values. Remarkably, testing errors range within [−16.6585, 44.7929  According to the obtained R values (0.97164, 0.97040, and 0.97061), all three hybrids are able to predict the P E of a CCPP with highly reliable accuracy. In all regression charts, there is an outlying value, P E = 435.58 (obtained for AT = 7.14 • C, V = 41.22 cm Hg, AP = 1016.6 mbar, and RH = 97.09%) that is predicted to be 480.3728513, 481.3282482, and 481.4228308.

WCA vs. ALO and SBO
The quality of the results showed that the WCA, ALO, and SBO metaheuristic algorithms benefit from potential search strategies for exploring and mapping the P E pattern. However, comparative evaluation using the RMSE, MAE, MAPE, and R pointed out noticeable distinctions in the performance of these algorithms. Figure 7 depicts and compares the accuracies in the form of radar charts. The shape of the produced triangles indicates the superiority of the WCA-based model over the benchmark algorithms in both training and testing phases. In terms of all four indicators, this model could predict the P E with the best quality. It means that the ANN supervised by the WCA is constructed of more promising parameters. Following the proposed algorithm, the SBO won the competition with ALO. It is noteworthy that the accuracy of these two algorithms in the testing phase was closer compared to the training results.

WCA vs. ALO and SBO
The quality of the results showed that the WCA, ALO, and SBO metaheuristic algorithms benefit from potential search strategies for exploring and mapping the PE pattern. However, comparative evaluation using the RMSE, MAE, MAPE, and R pointed out noticeable distinctions in the performance of these algorithms. Figure 7 depicts and compares the accuracies in the form of radar charts. The shape of the produced triangles indicates the superiority of the WCA-based model over the benchmark algorithms in both training and testing phases. In terms of all four indicators, this model could predict the PE with the best quality. It means that the ANN supervised by the WCA is constructed of more promising parameters. Following the proposed algorithm, the SBO won the competition with ALO. It is noteworthy that the accuracy of these two algorithms in the testing phase was closer compared to the training results. From the time-efficiency point of view, computations of the ALO were shorter than the two other methods. The elapsed times for tuning the ANN parameters were nearly   From the time-efficiency point of view, computations of the ALO were shorter than the two other methods. The elapsed times for tuning the ANN parameters were nearly 14,261.1, 12,928.1, and 14,871.3 s by the WCA, ALO, and SBO, respectively. It should be also noted that the WCA and ALO used PS = 400, while this value was 300 for the SBO.
According to the above results, the WCA provides both an accurate and efficient solution to the problem of P E approximation, and thus, sustainable development of the CCPPs. It is true that the ALO could optimize the neural network in a shorter time, but smaller PSs of the WCA (i.e., 300, 200, ...) were far faster. On the other hand, back to Figure 4, the PS of 300 produced a solution almost as good as that of 400. It is interesting to know that the prediction of PS = 300 was slightly better than PS = 400 (testing RMSEs 4.0760 vs. 4.0852). The computation time of this configuration was around 3186.9 seconds which is considerably smaller than the two other algorithms. Thus, for time-sensitive projects, less complex configurations of the WCA are efficiently applicable.

Predictive Formulas
Due to the comparisons in the previous section, the solutions found by WCAs with the PSs of 300 and 400 are presented here in the form of two separate (different) formulas for forecasting the electrical power. Equations (13) and (14) give the P E through a linear relationship.
where Y i and Z i (i = 1, 2, ..., 9) symbolize the output of the hidden neurons. These parameters are calculated using a generic equation as follows: and with the help of Table 2. According to the above formulas, calculating the P E consists of two steps: First, recalling the MLP structure ( Figure 3) and also Equation (11) from Section 3.2, Equation (15) is applied to produce the response of nine hidden neurons (e.g., Y 1 , Y 2 , . . . , Y 9 for the formula corresponding to PS = 300). For instance, W 32 represents the weight of the 3rd neuron applied to the 2nd input (i.e., V). Thus, it equals 1.152 in Table 2 used for calculating Y 3 . Next, these parameters are used by the output neuron (in Equation (13)) to yield the P E . The same goes for the formula corresponding to PS = 400 (Z 1 , Z 2 , . . . , Z 9 and Equation (14)).

Conclusions
This paper investigated the efficiency of three capable metaheuristic approaches for the accurate analysis of electrical power output. The water cycle algorithm was used to supervise the learning process of an ANN. This algorithm was compared with two other techniques, namely antlion optimization and a satin bowerbird optimizer. The results showed the superiority of the WCA in all cases and terms of all accuracy indicators. For example, the RMSEs of 4.1468 vs. 4.2656 and 4.2484 in the training phase and 4.0852 vs. 4.1719 and 4.1614 in the prediction phase. However, all three hybrids could understand and reproduce the P E pattern with less than 1% error. All in all, a significant sustainability issue was efficiently managed and solved by metaheuristic science. Thus, the presented hybrid models can be practically employed to forecast the electrical power output of combined cycle power plants by having the records of AT, V, AP, and RH. They can also be appropriate substitutes for time-consuming and costly methods. However, further efforts are recommended for future projects to compare the applicability of different metaheuristic techniques and also to present innovative measures that may improve the efficiency of the existing models in terms of both time and accuracy.