Neural Network Approach for Global Solar Irradiance Prediction at Extremely Short-Time-Intervals Using Particle Swarm Optimization Algorithm

Hourly global solar irradiance (GSR) data are required for sizing, planning, and modeling of solar photovoltaic farms. However, operating and controlling such farms exposed to varying environmental conditions, such as fast passing clouds, necessitates GSR data to be available for very short time intervals. Classical backpropagation neural networks do not perform satisfactorily when predicting parameters within short intervals. This paper proposes a hybrid backpropagation neural networks based on particle swarm optimization. The particle swarm algorithm is used as an optimization algorithm within the backpropagation neural networks to optimize the number of hidden layers and neurons used and its learning rate. The proposed model can be used as a reliable model in predicting changes in the solar irradiance during short time interval in tropical regions such as Malaysia and other regions. Actual global solar irradiance data of 5-s and 1-min intervals, recorded by weather stations, are applied to train and test the proposed algorithm. Moreover, to ensure the adaptability and robustness of the proposed technique, two different cases are evaluated using 1-day and 3-days profiles, for two different time intervals of 1-min and 5-s each. A set of statistical error indices have been introduced to evaluate the performance of the proposed algorithm. From the results obtained, the 3-days profile’s performance evaluation of the BPNN-PSO are 1.7078 of RMSE, 0.7537 of MAE, 0.0292 of MSE, and 31.4348 of MAPE (%), at 5-s time interval, where the obtained results of 1-min interval are 0.6566 of RMSE, 0.2754 of MAE, 0.0043 of MSE, and 1.4732 of MAPE (%). The results revealed that proposed model outperformed the standalone backpropagation neural networks method in predicting global solar irradiance values for extremely short-time intervals. In addition to that, the proposed model exhibited high level of predictability compared to other existing models.


Introduction
Sustainable energy sources such as photovoltaics (PV) generation is becoming increasingly important in the current times due to the depletion of natural resources, increasing energy demand, high cost of new fossil-fuel generation and transmission power system infrastructure, and worsening greenhouse gas effects. However, PV output power is not dispatchable in terms of supply and demand because of inherent intermittency in solar irradiance. Be it large-scale or nanogrid PV generation, energy storage devices such as batteries and ultracapacitors are required to manage energy and transient power demand, increase the estimation accuracy of GSR despite the wide-ranging climate variability. Improvement in GSR prediction is also performed by hybridizing extreme-learning-machine and neural-grouping genetic algorithms into a single model, where optimal feature selection is achieved [29].
This study aims to overcome the drawback of standalone AI models by examining the adaptability, efficiency and accuracy of the hybrid backpropagation neural network and particle swarm optimization (BPNN-PSO) method for prediction of solar irradiance in very short-time intervals and fast changing climate conditions as there has been no extensive studies on prediction of GSR in those conditions in a tropical country. This paper also designs and develops a high-efficiency prediction model through the use of the actual meteorological data from Kajang, Malaysia. The meteorological data include pressure, humidity, temperature, wind speed, wind direction, and diffuse and direct irradiances.
The proposed PSO is used inside the BPNN to improve the prediction performance by reducing the error with actual data and improving the convergence rate by minimizing the objective function. The PSO optimizes hidden layers, neurons, and learning rate of BPNN architecture. The best minimized combination value of these three parameters in BPNN is the main objective function. The accuracy of the developed hybrid BPNN-PSO model is compared and evaluated with other existing model using reliable statistical indicators, such as root mean square error (RMSE), mean squared error (MSE), mean absolute error (MAE), and mean absolute percentage error (MAPE).
This paper is organized as follows: first, a literature review is presented. In Section 2, the BPNN algorithm is presented. In Section 3, data preparation for training and testing the PSO algorithm working concepts, and the performance evaluation of the model is discussed. In Section 4, the simulations results are analyzed. Finally, the achievements of the proposed method are highlighted in the conclusion.

Theory of Backpropagation Neural Network Structure
The ANN models nonlinear systems and make every effort to create the function of the human brain in a simulation environment through distributed processing. The ANN consists of neurons, which are simple connected elements that have the advantage in solving complex nonlinear relationship between system input and output [30,31]. Figure 1 shows the structure of the BPNN model employed. The model consists of seven meteorological inputs that are obtained from Kajang, Malaysia, which include pressure (P), temperature (T), humidity (H), wind speed (WS), wind direction (WD), irradiance direct (IDR), and irradiance diffuse (IDF), a few hidden layers and one GSR as output parameter. In order to predict GSR, this paper employs the Levenberg-Marquardt backpropagation algorithm to train the multi-layer perceptron of the ANN model in Matlab. The Levenberg-Marquardt algorithm has been selected due to its minimal localization error as well as its efficiency and speed [32]. The training of ANN using Levenberg-Marquardt back propagation algorithm involves three phases: (a) The feedforward phase, (b) the computation and backpropagation of the associated error, and (c) the adjustment of the weights. All inputs and outputs of the BPNN are expressed by the following equations: where GSR is the global solar irradiance of the meteorological datasets. During the feed-forward phase of ANN training, data samples consisting of seven input values (P, T, H, WS, WD, IDR, and IDF) and one output or target value GSR are presented into the ANN. The explanation of BPNN working progress is summarized in the following steps: Step 1: The weight and bias are randomly initialized.
Step 2: The input layer relays the input signals to the hidden nodes. Then, the variables in the hidden nodes are calculated by using [33], where Z in the input of the hidden nodes, and j is the number of hidden nodes that is calculated by using the PSO and w is the weight factor.
Step 3: The hidden layer is calculated by using the sigmoid function and it is given by, For input pattern p, the i-th input layer node holds xp,i. Net input to j-th node in the hidden layer is, where, w j,i is the weight from the input layer to the hidden layer, θ j,i represents the bias from the input layer to the hidden layer. Output of j-th node in the hidden layer is, Net input to k-th node in the output layer is, where, w k,i , θ k,j , are the weight and bias from the hidden layer to the output layer, respectively. Output of k-th node in the output layer is, Step 4: In the BPNN training, after executing the feed-forward phase, the next phases are to compute the backpropagation of the associated error and then adjust the weights. During training, the GSR is compared with its target value in the sample data to determine the associated error (e). Based on this error, factor δ k that is used to distribute the error at the output layer back to all hidden nodes is given by, where T k is the true output (GSR). The error in the hidden layer is calculated as Step 5: BPNN in this phase, updates error and biases. Weights are updated using the following equations ∆w k,j = αδ k S j (11) where α is the learning rate which can be assigned values between 0 and 1. Biases are updated using the following equations The adjustment of the weights from the input to the hidden layer is based on factor δ j in (10) and the activation of the input features. Using (4) to (18), the BPNN process is repeated for all the data set samples to achieve one epoch. The training process will continue until the error goal or the predefined epoch is achieved. After the training process, the ANN can be utilized to generate the reference GSR with new input data.

Methodology Proposed Hybrid BPNN-PSO for Predicting GSR
The BPNN architecture is implemented at the process of training the GSR due to its capability to minimize output error by optimizing the input weight values of the output layer. However, three significant parameters in BPNN architecture namely hidden layers (HL), number of neurons (Ne), and learning rate (LR) are normally set based on trial and error method, which is time consuming. In order to further enhance the performance of the GSR prediction and overcome the time-consuming process of trial and error method, the PSO algorithm is utilized to find the optimal best combination of the dimension array. In addition, PSO is used to give reliable GSR prediction with less variance, less error, and the best fitting for the prediction function. Figure 2 shows the schematic diagram for prediction GSR using BPNN based PSO. The process is categorized into three phases which are clearly illustrated as follows: Phase I: This phase begins with the collection of seven input variables at 5-s and 1-min time intervals. This study has included two profiles for the reason of improving the prediction performance by capturing the nonlinear association of patterns between different meteorological parameters, such as temperature, pressure, humidity, wind speed, wind direction, irradiance direct, and irradiance diffuse. After that, the sample data is moved through the normalization method.
Phase II and III: In these phases, 10-fold cross validation method is used for data pre-processing and division into training and testing observations. Then, the BPNN model is used for GSR prediction where the optimal number of hidden layer neurons and learning rate of BPNN model are computed based on PSO algorithm to enhance the accuracy of GSR.

Particle Swarm Optimization Algorithm
PSO is a heuristic optimization technique, which is based on population inspired by bird flocking and fish schooling for solving nonlinear problems with both discrete and continuous variables. The PSO algorithm is robust and easy to implement with global exploration capability in various applications [34]. In PSO, the potential solution to the problem being considered is randomly generated though a population individual particles also known as "swarm". Each particle will move at arbitrary velocity across a dimensional search space to find two locations. The swarm will keep two locations. The first location is the best position in the current iteration also known as the local best, and the second location is the best point found in all previous iterations also known as the global best. The velocity and position factors are updated as [35]: where c 1 is the social rate, and c 2 is the cognitive rate. r 1 and r 2 denote the randomness in the interval (0,1), V is the velocity factor of agent i at iteration d, t is the present iteration, w is the inertia factor, and X is the position factor.

Performance Evaluation
To investigate the performance and validate the accuracy of each model, four statistical index errors are implemented. In addition, a previous benchmark study at different sites are compared with the proposed hybrid model. There are various statistical indices that have been used by other researchers [36]. However, MSE, MAE, and MAPE are the most common statistical error indexes [37]. Therefore, in this study, the performance evolution of the proposed techniques are investigated through the following indexes: where error = GSR A i − GSR P i , GSR A i is the state of charge of the actual data, and GSR P i is the state of charge of the predicted data, and n is the number of samples.

Data Preparation and Model Execution
The prediction of the GSR for one-day and three-day profiles begins with the collection of seven input variables at 5-s and 1-min time intervals. This study has included two profiles for the reason of improving the prediction performance by capturing the nonlinear association of patterns between different meteorological parameters, such as temperature, pressure, humidity, wind speed, wind direction, irradiance direct, and irradiance diffuse. At the first stage, the data have been collected from the TNBR-Solar Resource Monitoring Station, located in Kajang, Malaysia, with different time intervals. The data is collected from 1 March 2013 to 15 February 2014 using a high sampling data logger at a sampling rate of 5 and 30 s for 1, 5, 30, and 60 min interval. This measurement was taken as part of a Seeding Fund Project TNBR/SF140/2010 entitled "Development of Solar Research Facility for studies of Grid Connection of Utility Scale Solar Power Plant". The Solar Resource Measurement was performed as follow: (a) GSR was measured by a CMP11 pyranometer, (b) Diffuse irradiance was measured by a CMP11 pyranometer and shaded by a shading ball attached, and (c) Direct irradiance was measured by a CHP1 pyrheliometer. The irradiance and temperature were measured using a solar pyranometer sensor and a temperature sensor, respectively. All measurements were performed instantaneously for every 5-s and 1-min time intervals and the measurements were recorded using the DT80 data logger.
The available meteorological data features would guarantee a high efficiency prediction of the solar irradiance during the short time intervals (5-s and 1-min) without any new features. Moreover, to increase and improve the accuracy of the output GSR parameter, all other possible data features that may have a direct effect on the output parameter performance are included in the predictive models. All the saved input datasets are normalized before being trained by BPNN model to increase the robustness and efficiency of the system. Besides, it enhances the convergence rate of the predicted GSR. In this study, the data normalization range is from 0 to 1 and is determined as: where the Data max and Data min are the maximum and minimum value of the BPNN trained dataset. The testing dataset is also normalized with the same range limit. The trained dataset has been divided to 70% for training and 30% for testing, where 10-fold cross validation method is implemented in all input arrays. Figure 3 shows the flowchart of the proposed hybrid BPNN-PSO implementation for predicting the PV GSR in detail. The implementation procedure of the proposed hybrid technique in determining the optimum number of hidden neurons (Ne), hidden layers (HL), and learning rate (LR).  Table 1 presents the parameters used in the initialization of the algorithm. Then the initial local and global best positions, P best and G best are randomly generated. In order to train the ANN and evaluate the fitness value, the number of learning rate, hidden neurons, and layers should be selected. The objective function MAE is evaluated based on (21) and the new P best and G best are updated. The velocity and position are computed and updated using (19) and (20). If the position is rejected, other combinations have to be identified. Training of ANN and evaluation of MAE should be repeated until the maximum population size is reached or when P best is lower than G best . If the local best value is less than global best value, the previous is updated as the global best value. The process is iterated until the stop criterion is reached so as to obtain the optimal hidden layers and hidden neurons.

Objective Function Performance of PV Solar Irradiance
Figures 4 and 5 depict solar irradiance convergence results using BPNN with the assistance of PSO algorithm, which are evaluated under different time intervals of 5-s and 1-min for both 1-day and 3-days profiles. The performance evaluation is represented by statistical index error or objective function (MAE), which compares the simulation convergence results over 100 iterations. The optimization parameters of population size and the iteration numbers are standardized with 4 and 100 maximum iterations, respectively. In Figure 4, the proposed hybrid algorithm BPNN-PSO shows less error compared to actual GSR data, where the minimum objective function MAE values obtained from 3-Days profile for 5-s and 1-min time intervals are 0.7537, and 0.2754, respectively. Figure 5 presents the one-day profile with better convergence rate over the 3-days profile for both time intervals of 5-s and 1-min with 0.1000 and 0.0956, respectively.  Other statistical index errors like MSE, RMSE, and MAPE are used to give more extensive analysis of the output performance for both profiles in 1 day and 3 days as shown in Table 2

PV Solar Irradiance Optimal Parameters
The optimal parameters HL, Ne, and LR, acquired after implementing the heuristic optimization algorithms PSO for the 1-day and 3-days profiles are shown in Table 3. In the 3-days profile, the BPNN-PSO algorithm attains hidden layers of 1 (7 Ne) and 2 (14 and 9 Ne) after 22 and 35 iterations for 5-s and 1-min time intervals. In the 1-day profile, the proposed BPNN-PSO achieved the optimal value of hidden layers of 2 (8 and 2 Ne) and (9 and 11 Ne) after 76 and 91 iterations for 5-s and 1-min time intervals, respectively. In contrast, the best learning rate values of 3-days and 1-day profiles are 0.1295, 0.7373, 0.5946, and 0.6481 during both time intervals of 5-s and 1-min, respectively.

PV Solar Irradiance Prediction
The input parameters considered during the training process of BPNN and PSO are the seven input as shown in Figure 2. The selection of the input features is set to seven after conducting a set of trial-and-error tests to the predictive models, the increase of the input features from one to seven increases the performance of the proposed predictive model BPNN-PSO. Any further increase in the number of input features after seven did not show further increase in the performance of the BPNN and PSO.  Figures 6 and 7 present the predicted solar irradiance using the proposed hybrid BPNN-PSO algorithm and compare the predicted GSR with the reference or actual data. In the figures, the red line represents the actual global solar irradiance data obtained from Kajang, Malaysia, while the blue line is the solar irradiance prediction of the proposed hybrid BPNN-PSO algorithm. It is clearly observed that the performance of the BPNN-PSO over-classed other techniques in both 3-days and 1-day profiles, In Figures 6a and 7a, the predicted solar irradiance of BPNN-PSO is almost aligned with the actual data in both profiles. The time domain response agrees well with the different statistical index errors in Table 2, which proves the superior performance of the proposed technique over the other conventional test techniques under the two time intervals of 5-s and 1-min, respectively. From Figure 7a, it is also noticeable that the results obtained from the proposed BPNN-PSO technique is robust and able to track the fast variation of the actual environmental data. The absolute error between predicted and actual values of 3-Days profile at 5-s and 1-min intervals is represented with the enlarged visual box to provide more clarity on the GSR prediction results, as depicted in Figures 6b and 7b.
Moreover, the 1-day profile is also tested to investigate the ability and adaptability of predictive models under low data set and fast nonlinear environmental change of the solar irradiance. The 1-day profile are used to train the BPNN-PSO with the meteorological data obtained from Kajang, Malaysia, which were recorded on 22 February 2014. The predicted GSR of the proposed BPNN-PSO technique is compared at 5-s and 1-min time intervals. The BPNN-PSO model has a good alignment with the actual data, as depicted in Figures 8a and 9a. The absolute error between predicted values of GSR using the proposed model with the target test values is presented in both 5-s and 1-min time intervals, respectively, as shown in Figures 8b and 9b. The maximum error is approximately 8%, which is negligible.

Performance Comparison Using Regression Coefficient
The regression coefficient (R) is used as an indicator of the predictive model's training process performance. The predicted data is displayed in black circle, while the blue line represents the reference value (Actual target). The regression coefficient results are very close to unity, which validates the accuracy of the model. The regression values of 3-Days profile with 5-s and 1-min time interval are 0.99951 and 0.99993, as shown in Figure 10. Moreover, Figure 11 shows that the 1-day profile have regression coefficients of 0.99999 at 5-s and 1-min intervals, respectively.  respectively, as shown in Figures 8b and 9b. The maximum error is approximately 8%, which is negligible.     Table 4 presents the GSR prediction results of the proposed method that are benchmarked with existing results. The proposed hybrid model is compared with various empirical techniques, artificial intelligence, and hybrid artificial intelligence. The prediction accuracy of the proposed and existing methods is investigated through the four statistical index errors RMSE, MSE, MAE, and MAPE, respectively. The results obtained from the proposed study show greater performance in terms of predictability and improved forecasting capability as compared with the rest of the other methods.

Conclusions
This paper has presented a hybrid prediction model using BPNN based PSO for the enhancement the GSR prediction performance. Two profiles, 3-days and 1-day have been investigated during training and validation process. The main contribution of this paper is developing a robust and consistent BPNN-PSO model for prediction of global solar irradiance at tropical country like Malaysia in extremely short-time intervals. Secondly, the implementation of PSO algorithm has significantly enhanced the classical BPNN architecture, by finding the optimal values of the architecture parameters, namely, hidden layers, neurons, and learning rate. The performance results of the proposed BPNN-PSO model have been compared with other widely used neural network models. Statistical error indicators RMSE, MAE, MSE, and MAPE have been used for performance and precision evaluation of all models. When the proposed model is used to predict solar irradiance based on the dataset of one region in Malaysia, the developed model has shown remarkable prediction improvements, proving that the model is superior to other techniques in terms of reliability, adaptability, and accurate correlation in GSR prediction of fast, short-time intervals, and nonlinear nature. . Even with the high accuracy performance of the model in different time intervals, the high performance is restricted to the availability of the aforementioned meteorological parameters. Moreover, the execution of the model needs to be extended to include spatial database using the proposed model for GSR prediction in short-time intervals. In addition, the optimization and calibration of the model could be proposed for the future work to make the model adaptable in different world regions.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.