An Innovative Metaheuristic Strategy for Solar Energy Management through a Neural Networks Framework

: Proper management of solar energy as an effective renewable source is of high importance toward sustainable energy harvesting. This paper offers a novel sophisticated method for predicting solar irradiance (SIr) from environmental conditions. To this end, an efficient metaheuristic technique, namely electromagnetic field optimization (EFO), is employed for optimizing a neural network. This algorithm quickly mines a publicly available dataset for nonlinearly tuning the network parameters. To suggest an optimal configuration, five influential parameters of the EFO are optimized by an extensive trial and error practice. Analyzing the results showed that the proposed model can learn the SIr pattern and predict it for unseen conditions with high accuracy. Further-more, it provided about 10% and 16% higher accuracy compared to two benchmark optimizers, namely shuffled complex evolution and shuffled frog leaping algorithm. Hence, the EFO-super-vised neural network can be a promising tool for the early prediction of SIr in practice. The findings of this research may shed light on the use of advanced intelligent models for efficient energy development.

the artificial neural network (ANN) is founded on a set of algorithms aiming to recognize underlying correspondence among a group of input-output data [81][82][83][84][85]. In another sense, the ANN represents a sophisticated nonlinear approach that has been proposed as a popular tool for different modeling tasks [86]. Among various notions of the ANNs, multi-layer perceptron (MLP) [87,88] is an important one composed of (at least) three layers. Each layer contains one or more neurons that handle the computation tasks [89][90][91][92][93][94]. As some medical applications of machine learning, scholars like Xia, et al. [95], Hu, et al. [96], Wang, et al. [97], and Chen, et al. [98] have achieved satisfying solutions.
Having a reliable forecast of solar irradiance (SIr) is of great importance, due to its effect on the design of photovoltaic systems and measuring solar energy production [99,100]. Figure 1 shows solar radiation on a photovoltaic module installed on the Earth. Up to now, scholars have suggested various methods (e.g., empirical [101] and remote sensing [102] approaches) for analyzing the SE parameter. However, recent advances in soft computing have led to the utilization of diverse machine learning tools for this purpose. These modes have gained a lot of attention for renewable energy analysis like feature selection [103]. Artificial neural network (ANN), for example, is a flexible type of machine learning that has been broadly used for prediction tasks. Barrera, et al. [104] proposed an ANN model developed with open data sources for analyzing SE and also the effect of environmental factors on this parameter. The used model was found to be more accurate than previous methods (with a mean square error (MSE) of 0.040 vs. 0.055). Yaïci, et al. [105] demonstrated the effectiveness of ANN for simulating the SE systems. They also investigated the effect of the problem dimension (i.e., the number of inputs) on the accuracy, and after testing the model using real-world (Ottawa, Canada) data, they professed that the accuracy falls gradually with reducing the dimension. Yadav, et al. [106] conducted a comparison among different ANN models, namely radial basis function neural network (RBFNN), fitting tool (nftool), and generalized regression neural network (GRNN), for analyzing the potential of SE resources in India. They reported the superiority of the nftool, as it could nicely predict the desired parameter for many locations.
Meenal and Selvakumar [107] studied and demonstrated the accuracy of a popular machine learning system called support vector machine (SVM) for solar radiation modeling. This method, when implemented with an optimal dataset, outperformed the ANN and empirical approaches for this purpose. Mohammadi, et al. [108] performed a feature analysis using another well-known processor, namely the adaptive neuro-fuzzy inference system (ANFIS) for global solar radiation modeling. Quej, et al. [109] compared the potential of ANN, SVM, and ANFIS for simulating daily solar radiation. Concerning the respective average correlations of 0.652, 0.689, and 0.645 obtained for the best models, the SVM emerged as the most reliable predictor.
Abedinia, et al. [140] designed a forecast engine based on a metaheuristic optimizer called shark smell optimization combined with ANN for approximating solar power. Due to the better performance of this model in comparison with conventional predictors like conventional ANN, RBFNN, GRNN, and their wavelet versions (normalized root mean square errors (RMSEs) around 11 vs. those above 14), they introduced it as a capable engine. Galván, et al. [141] benefitted from a multi-objective particle swarm optimization (PSO) technique for optimizing the intervals of the SE modeling. They built a nonlinear method using ANN, and their findings revealed the high applicability of the PSO optimizer for the mentioned objective. Zhao, et al. [142] employed two metaheuristic techniques, namely shuffled complex evolution (SCE) and Teaching-Learning-Based Optimization (TLBO), to predict the compressive strength of concrete. Likewise, Halabi, et al. [143] could effectively use this algorithm coupled with an ANFIS system for monthly solar radiation approximation. Vaisakh and Jayabarathi [144] suggested a hybrid of two methods, namely the deer hunting optimization algorithm and grey wolf optimization, for tuning the structure of various ANNs applied to SIr forecast. Their results showed a promising improvement attained by the proposed optimizer. Louzazni, et al. [145] showed the competency of the firefly algorithm for analyzing the parameters of the photovoltaic system under different conditions. Compared to previously used metaheuristic techniques, the firefly algorithm achieved reliable and valid results in tuning the photovoltaic parameters. The efficiency of the PSO and genetic algorithm (GA) for a similar objective was demonstrated by Bechouat, et al. [146]. Wind-driven optimization was successfully used by Abdalla, et al. [147] to deal with the optimal power tracking of photovoltaic systems. This algorithm performed more efficiently than several optimization techniques, such as PSO, the bat algorithm, and cuckoo search.
According to the explained literature, metaheuristic algorithms can yield promising solutions to complex issues like SIr prediction. However, a gap in knowledge has emerged as earlier studies have mostly used well-established strategies like PSO [148], GA [149], and the imperialist competitive algorithm [150]. Furthermore, these techniques take a noticeable time to reach stable optimization. This study, therefore, focuses on a novel metaheuristic strategy, namely electromagnetic field optimization (EFO), for the optimal prediction of the SIr. A significant advantage of this algorithm is its fast convergence relative to other existing techniques. The EFO supervises a nonlinear problem through an ANN framework. Moreover, two other quick algorithms, shuffled complex evolution (SCE) and the shuffled frog leaping algorithm (SFLA), are considered benchmark methods to comparatively validate the efficiency of the EFO.

Data Provision
For predicting the SIr, a publicly available dataset (provided by NASA and available at https://www.kaggle.com/dronio/SolarEnergy, accessed on 26/10/2020) is used in this work. Before this study, these data have been used for validating the performance of different developed models [151,152]. The SIr plays the role of the target parameter to be predicted with the inputs of temperature (T), barometric pressure (BP), humidity (H), wind direction (WD), and wind speed (WS).
The used dataset contains 32,686 rows of meteorological records obtained from the Hawaii space exploration analog and simulation (HI-SEAS) weather station. At approximately 5 minute intervals, the records belong to the time between 23:55:26 29 September 2016 and 00:00:02 1 December 2016. Figure 2 shows the variation of the SIr over one day (29 September 2016 taken as an instance). As expected, peak values are observed at midday. Moreover, Figure 3 depicts the relationship between the SIr and each input factor in the form of scatter charts for the whole dataset.   Considering the R 2 values calculated in Figure 3 (0.5402, 0.0142, 0.0512, 0.053, and 0.0054 for the T, BP, H, WD and WS, respectively), it can be said that the most meaningful relationship (among these five inputs) is obtained for the T. In a general view, the values of SIr tend to increase with the increase in this factor. A detailed statistical description of the used dataset is presented in Table 1  In artificial intelligence implementation, it is well-established that machines use some (the majority) of the instances for learning the existing input-target pattern. They then apply this pattern to the remaining instances for evaluating the prediction ability. For this study, the dataset (i.e., 32,686 instances) was randomly divided into two groups with 26,149 and 6537 instances (80% and 20% of the whole) to generate the training and testing dataset, respectively.
Since the data are randomly selected, there are samples from all over the dataset in both the training and tested boxes. However, the scattering and broadness of data ( Figure  3) indicate that the predictive models deal with a wide variety of data (e.g., an SIr value with similar temperature and barometric pressure) that make the problem intrinsic. Thus, it can be another factor for evaluating the generalizability of the used models.

The EFO
Abedinpourshotorban, et al. [153] developed a physics-based optimization strategy and named it electromagnetic field optimization. Many scholars have benefited from this method for a wide range of problems [154,155]. It is a population-based technique in which each individual is represented by an electromagnetic particle (EMP). The EMPs are distinguished by different polarities. The attraction-repulsion rule is used to improve the solution by changing the position of the EMPs.
The steps of the EFO can be explained as follows: Step 1: A set of EMPs are randomly generated and the fitness of each one is calculated. The particles are then sorted based on these fitnesses. Each particle is made of N_var electromagnets (tantamount to the number of problem variables).
Step 2: This is dedicated to dividing the EMP population into three field groups with negative, positive, and neutral polarities. The positive field group comprises the best-fitted individuals tunable by a so-called "P_field" parameter, the negative field group comprises the worst-fitted individuals tunable by a so-called "N_field" parameter, and the rest lie in the third group.
Step 3: Each repetition of the algorithm generates a new EMP. Once this EMP is better fitted than the weakest one, it is considered as a part of the population and confiscates the position of the weakest EMP. Figure 4 shows the generation process and determination on the polarity of the new member. In this process, for j = 1 → N_var, an electromagnet belonging to the neutral field group is chosen. Next, a random value is considered and compared to a parameter called Ps_rate, which indicates the probability of choosing electromagnets of the created EMP from the positive field. Equation (1) is used for the situation random value < Ps_rate; otherwise, Equation (2) expresses the generation process. , where PF and NF symbolize positive and negative fields, GR gives the golden ratio, and is the random value inside [0, 1].
Step 4: A randomization operator is responsible for diversifying the new EMPs. Another random value is generated and compared to a parameter called R_rate, which indicates the probability of replacing one electromagnet of the created EMP with a random electromagnet. If random value < R_rate, a new electromagnet replaces one electromagnet of the created EMP [155].

The Benchmarks
The SCE and SFLA are efficient metaheuristic techniques that are used as comparative methods in this work. While both algorithms are based on shuffle action, the SCE is an older optimizer. Duan, et al. [156] and Eusuff and Lansey [157] presented the SCE and SFLA in 1993 and 2003, respectively. Although this study is one of the first usages of the EFO for supervising an ANN, scholars like Zheng, et al. [42] and Ma, et al. [158] have reported successful performance of the SCE and EFO for this purpose.
The SCE implements a combination of the Nelder-Mead simplex technique, genetic algorithm, complex shuffling, and controlled random search for doing the optimization. After creating the population, the individuals are grouped in some containers called complexes. The algorithm uses competitive complex evolution for evolving these complexes. It then synthesizes evolved units to create a larger community. This step results in more interactive agents for better sharing of the obtained knowledge [159]. The pivotal idea of the SFLA is the relationship between frogs settled in some containers called memeplexes. It is known as a quick and efficient search scheme that synthesizes PSO with the memetic algorithm. The fitness of the frogs is a measure for classifying them as the memeplexes. The SFLA pursues updating the position of the frogs in these units, and also importing new ones instead of the worst individuals [160]. The benchmark algorithms are mathematically detailed in earlier studies like [161,162] (for the SCE) and [163,164] (for the SFLA).
Similar to the EFO, two separate ANNs are supervised by the benchmark algorithms to explore and predict the SIr. The performance of these three methods is compared in the following sections to return an optimal metaheuristic-based methodology for this purpose.

Accuracy Assessment Measures
The accuracy of SIr prediction is reported by well-known indices as follows. Given = − , the error of prediction for a total of N instances is calculated by the RMSE and mean absolute error (MAE) indices. According to Equations (3) and (4), the RMSE gives a rooted value of the averaged squared errors, while the MAE releases an average of the absolute error values.
A correlation index called Pearson correlation coefficient (R) is also defined to show the consistency between the recorded SIrs and the products of each network. Equation (5) formulates the R: where symbolizes the average of the SIr values.

Optimization and Training
A 5 × 45 × 1 MLP neural network (indicating 5 nodes in the input layer, 45 nodes in the middle layer, and 1 node in the output layer) is used to connect the SIr to its input factors. Due to a large number of data instances, this network is a complex system that is supposed to be supervised by the EFO algorithm. The main role of the EFO is to adjust the MLP internal parameters so that the SIr pattern is optimally established.
After creating the EFO-MLP hybrid, it is trained by mining the training group. Since metaheuristic algorithms are population-based iterative techniques, optimum values should be considered for these two parameters, i.e., population size (NPop) and the number of iterations (NIt). Although many optimization algorithms reach a stable situation by around 1000 iterations, the EFO needs more effort. Based on experience and also evaluating the behavior of the model, the EFO-MLP was implemented by a total of 50,000 iterations. The appropriate values for NPop, as well as four other parameters, were determined one by one by testing different values. The convergence curves of the tested EFO-MLPs are shown in Figure 5. First, the models with different NPops (25, 26, 27, 28, 30, 35, and 40) were tested (when R_rate = 0.01, Ps_rate = 0.01, P_field = 0.02, and N_field = 0.4). Figure  5a shows that the NPop = 26 gives the lowest error. Thus, the subsequent models were tested with this NPop. Five R_rates of 0.01, 0.015, 0.02, 0.03, and 0.04 were similarly assessed. According to Figure 5b, R_rate = 0.01 is the most suitable one. Next, investigating the effect of Ps_rate in Figure 5c revealed that the lowest error is obtained for Ps_rate = 0.03. As is exhibited in Figure 5d A similar strategy was executed for the benchmark models (i.e., SCE-MLP and SFLA-MLP). Table 2 denotes the values assigned to the used algorithms. As is seen, the SCE and SFLA were implemented with 1000 iterations.  (Table 1), indicate an acceptable level of error. Moreover, the correlation values of 0.82275, 0.78208, and 0.75431 demonstrate a high agreement between the training products and expected SIrs.

Testing Results
As explained in Section 2, the second part of the dataset plays the role of unseen environmental conditions. The models use this data to evaluate their testing ability. In this regard, the SIr is forecasted for the testing instances and these values are compared with the expected values. Since the model does not perform any analysis on these instances, it has to use the previously captured knowledge. Accordingly, the goodness of the results reflects the prediction capability of the intended model.
Considering the formula (Section 3.1), Figure 6 details the magnitude and statistics of error values calculated for the testing instances. In this phase, the RMSEs of 177.9764, 195.0984, and 205.6091 indicated a reliable prediction by all three models. Moreover, the goodness of the testing results can be supported by the MAEs of 115.2678, 136.2261, and 154.1603, as well as the R values of 0.82132, 0.78046, and 0.75212.
Moreover, from a graphical point of view, the histogram charts in Figure 6 show that the small errors outnumber large values. This can be derived from the sharp shape of the diagram around zero and the vicinity. Regarding the overall trend of these charts, the magnitude of the error increases as the frequency falls.

EFO vs. SCE and SFLA
It was stated that this research pursues a novel time-efficient methodology for analyzing the SIr. The EFO was presented as the pivotal method, while the SCE and SFLA acted as benchmark algorithms. Earlier sections showed the competency of all three supervised models. Hence, this section validates the performance of the EFO versus the SCE and SFLA.
For both training and testing groups, the error indicators showed a lower error of prediction, and, at the same time, the R index manifested a higher correlation for the EFOtrained model. Table 3 gives the accuracy improvements when the SCE and SFLA are replaced with the EFO. As is seen, in the case of EFO vs. SCE, the RMSE and MAE fall by nearly 10% and 18% in both phases, respectively. Additionally, a 4% enhancement resulted for the R index. As for EFO vs. SFLA, the changes are more tangible. The RMSE and MAE of both phases degrade by around 16% and 33%, respectively. The R index indicated a 7% better correlation, too.

Conclusions
This research was dedicated to finding a fast yet reliable solution for predicting solar irradiance. Since this parameter is affected by different factors, the problem is a nonlinear complex one. Therefore, a potent metaheuristic strategy called electromagnetic field optimization was considered for dealing with it. A neural network organized the general equations while the EFO tuned its parameters optimally. Moreover, this algorithm was compared with two shuffle-based metaheuristic techniques: the shuffled frog leaping algorithm and shuffled complex evolution. While an adequate level of accuracy was observed for all three hybrids, the EFO-MLP was significantly superior. For example, its error was around 10% and 16% below that of the SCE-MLP and SFLA-MLP, respectively. Referring to the R-value of 0.82132 for testing data, the proposed model can reliably predict the SIr for given environmental conditions. In comparison with other hybrid techniques such as SCE and SFLA, the EFO showed better performance. The employed accuracy indices for the applied benchmark technique (i.e., RMSE, MAE, and R2) were 9.64, 17.57, and 0.04 (vs. SCE) and 15.56, 32.59, and 0.07 (vs. SCE) for the used training dataset, and were 9.62, 18.18, and 0.04 (vs. SCE) and 15.53, 33.74, and 0.07 (vs. SCE) for the testing dataset. Having both in mind, the EFO algorithm could provide a more accurate predictive network in predicting the outputs. Apart from the high implementation speed, another advantage of the used EFO-MLP model lies in implementing optimized parameters (i.e., Npop, R_rate, Ps_rate, P_field, and N_field). Therefore, the findings of this study can be used for sustainable energy management. However, there may still be ideas for future works (e.g., using feature selection and filtrated data) for a more efficient methodology. Applying the developed method to other real-world sites can better reveal the advantages and drawbacks. Additionally, comparing the EFO with other capable optimizers or employing hybrids, ensemble and deep machine learning methods would be of high interest.

Conflicts of Interest:
The authors declare no conflicts of interest.