Multiple Site Intraday Solar Irradiance Forecasting by Machine Learning Algorithms: MGGP and MLP Neural Networks

.


Introduction
The increased penetration of renewable energy sources (RES) in power systems has created a complex challenge from the point of view of electric grid management [1][2][3], mainly due to high intermittence energy sources such as sun irradiation and wind [4,5]. Climatic variations instantly influence the electric power generation of wind farms and photovoltaic (PV) systems and may put the balance between load demand and power supply in the electrical power grid at risk. In this context, weather forecasting stands out as an important tool to guide the operation of electric power utility grids in the presence of intermittent RES [6,7]. Solar energy forecasting is normally classified in terms of two different forecasting horizons, namely intraday forecasting, from a few minutes to 6 h ahead, and day-ahead forecasting, where predictions are

Goiania, Brazil
The database from Goiania was acquired using a meteorological station setup at the Federal University of Goias (UFG) in Goiania-the capital city of the State of Goias, Midwest Brazil-whose coordinates are latitude −16.67 • (Southern Hemisphere), longitude −9.24 • (west); the station is located 749 m above sea level [28]. The three-year-long database sampled each 60 s from August 2015 to July 2018 is available at the webpage https://sites.google.com/site/sfvemcufg/weather-station [28]. The database has a rigorous data quality control with monthly inspection of equipment, and data are acquired and stored in a datalogger. Table 1 presents the configuration of the weather station setup for solar applications and research.

Milan, Italy
The database from Milan was acquired using a meteorological station setup installed at the SolarTech Lab at the Politecnico di Milano in Milan, Italy, whose coordinates are latitude +45.50 • (Northern Hemisphere), longitude +49.24 • (east); the station is located 120 m above sea level [30]. A database was sampled each 60 s for 26 months from September 2016 to October 2018 with the station at the SolarTech Lab [30].

SURFRAD-US
The other four databases were obtained from sites based in the United States of America and collected from the USA National Oceanic and Atmospheric Administration (NOAA) Surface Radiation Network (SURFRAD). The coordinates of these sites are as follows: Desert Rock, latitude +36.

Normalization
Independent of whether forecasts are performed with the use of artificial intelligence methods or classical regressions, the data processing strategy and input-output scheme play a key role in developing improved forecasts. The first data processing strategy considered global horizontal irradiance (G) as a target value, combining past values of irradiance and weather variables in addition to deterministic variables (in order to capture temporal trends in datasets) [31].
The proposed approach was refined by adopting a data processing strategy that forecasts normalized indexes in order to remove seasonality in solar data, yielding prompter ML algorithm convergence for irradiance forecasting. Values measured at night and during solar elevations (h) less than 5 • were neglected. Normalization of solar data can be performed by the application of Equation (1), where k * t is the so-called clear sky index, G is the observed global horizontal plane irradiance (GHI) and G clr is the theoretical clear sky irradiance.
Clear sky irradiance models used in the literature range from simple functions of extraterrestrial irradiance models to complex approaches that take numerous measured atmospheric parameters into account. It was found that Haurwitz clear sky irradiance and Ineichen-Perez models are simple and sufficiently accurate models that were systematically employed to evaluate meteorological data from a wide number of sites in the USA [32].
The Haurwitz clear sky irradiance model was developed in 1945 and is given by Equation (2), where θ z is the solar zenith angle (complementary to the solar elevation angle h). The constants 1098 and −0.057 were obtained by the least-squares method in order to fit measured cloudless sky irradiance data from a site in the USA to a theoretical curve based on a zenith angle exponential function. The exponential function is decreased by a factor proportional to cos θ z from sunrise to sunset.
The solar zenith angle is defined as the angle between the zenithal axis and the line to the sun. Thus, this angle varies instantly, according to the rotation movement of the Earth. The cosine of the solar zenith angle is obtained from Equation (3), where δ is the declination angle, φ is the latitude of the weather station location, and ω is the sun hour angle. A detailed definition and calculation of solar geometry variables is provided in [33].
cos θ z = cos φ cos δ cos ω + sin φ sin δ Ineichen-Perez clear sky irradiance uses optical air mass ratio (AM), atmospheric turbidity and altitude of location in clear sky irradiance modeling [34]. Ineichen-Perez G clr is calculated by Equation (4), where G o is the extraterrestrial irradiance, h is the solar elevation angle, a 1 , a 2 , f h1 and f h2 are constant functions of local altitude, T L is the Linke turbidity factor and AM is the optical air mass ratio. The constants in Equations (5) and 6 were added empirically by Ineichen and Perez to improve previous clear sky models which were logarithmically dependent on the Linke turbidity factor and limited to specific location and zenith angles. T L was obtained in this study from a map of monthly averaged values for each site [29]. In order to avoid discontinuities in T L and G clr calculations, a daily fitness procedure was used as presented in [35,36]. Figure 2 presents an example of daily T L fitness for Desert Rock.

Data Statistics
A general overview of the solar variability and statistics of each site is presented in Table 2, achieved by applying the Ineichen clear sky model for 15 min averaged point databases. Results in Table 2 show that training, validation and testing datasets present similar mean and standard deviations for k * t -an important requirement to implement ML forecasting models. Results from Desert Rock present a behavior with more clear sky conditions as opposed to other locations, thus presenting the highest mean k * t with lowest standard deviations, while results from Milan present the highest variabilities (σ).

Data Relations
The ML methods consists of a "multivariate" data structure of inputs, as defined in [17], to forecast k *

Genetic Programming
Genetic programming (GP) is an artificial intelligence technique which was originally proposed by Koza [37] in the evolutionary computation field; it is considered as an extension of genetic algorithms. GP is inspired by population genetics and biological evolution at the population level [38] (Algorithm 1). GP has proved to be competitive in time series forecasting in relation to other statistical techniques based on artificial intelligence, such as ANN and the support vector machine (SVM) [39][40][41]. It has been applied in numerous studies of predictions of natural resources-e.g., hydrology [42,43]-and has also been applied to daily or monthly solar irradiance forecasting in PV power systems [44][45][46].
MGGP and MLP neural networks were analyzed in comparison to other different methods of forecasting, considering that ANN comprises adjustable parameters that can be trained using optimization techniques to solve classification and regression problems and GP is a stochastic optimization method. When GP is used to build a mathematical model based on sampled data with the aim of predicting future values, it is named symbolic regression (SR). GP models are typically described as in Equation (7), where y is the observed output variable,ŷ is the predicted output, and x 1 . . . x n are the observed input variables. In contrast to other soft computing methodologies, such as feed-forward ANNs and SVMs, trained GP models are basic constitutive equations that can be implemented without a specific software environment in any modern programming language.ŷ = f (x 1 , ..., x n ) GP models can be classified into three different categories according to their mathematical model complexity: naive SR, when the model requires only one gene to relate input data with output data; scaled SR, when the model employs one gene associated to a bias term to relate input and output data; and multigene SR, when the GP uses multiple genes and a bias term to relate input and output data ( Figure 4).  (8), where a bias term d 0 is added to two genes with weights d 1 and d 2 in a tree structure. The terms "plus", "times", "square root" and "tanh" are known as node functions. Both weights and nodes are obtained in a GP training procedure.ŷ GP evolves a population of candidate solutions (population size) in multiple generations by the application of genetic operators with a tournament selection of best individuals. A crossover operation exchanges genes between individuals to assess possible structural improvements of individuals. Mutation is a fine adjustment operation that changes pieces or entire genes into new, random ones to evaluate a possible structural improvement in terms of fitness. Bias and gene weights of individuals are then optimized in terms of least root mean square errors applied to training data according to Equation (9). Applying an elitism strategy with a given elitism rate, a percentage of best fitness solutions is stored over generations. Based on these procedures, GP evaluates thousands of possible regression structures with optimized weights to relate inputs and outputs. Table 3 summarizes the parameters adopted in GP, which are considered again in Section 6. The dynamics of GP solutions are characterized by generalization ability, providing both accurate and robust solutions in training and for other datasets. On the other hand, ANN is highly influenced by overfitting, which is usually controlled by a validation step named early stopping, while GP does not require a validation step during the SR model training stage. Figure 5 presents the performance of the best individuals which evolved over generations for GP forecasts. It is possible to observe the robustness of solutions repeating from training to validation datasets. MGGP models were implemented on GPtips 2-an open-source GP platform for Matlab R [47].

Artificial Neural Networks
A feed-forward multilayer perceptron neural network (MLP) architecture was applied to this analysis, containing one hidden layer of 10 neurons using the hyperbolic tangent sigmoid transfer function. The neural networks were trained with the Levenberg-Marquadt algorithm including early stopping implemented in Matlab R using the neural networks toolbox. The employed ANN set of attributes was previously validated for intraday solar forecasting [14].

Ensemble Forecasts
Ensemble forecast models are convenient to build with multiple ML simulations and tend to improve forecast accuracy [48]. The ensemble forecast is given by Equation (10), where N trial is the number of trials by the given ML method. In this analysis, the internal parameters of GP and ANN do not vary in each trial, and 10 trials were performed to produce each ensemble according to the methodology described in [48].

Iterative Forecasts
Rana et al. [17] evaluated a forecast method where predictions of instant t+1 are iteratively added as inputs to predictions of instant t+2. As a conclusion, the iterative method did not improve forecasts in their study on PV power forecasting using ANN ensemble and SVM; however, the iterative GP method was tested in this work and yielded improvements on forecasting results. Results comparable to [17] were obtained, and no significant improvement was achieved by using iteration for ANNs. Therefore, the results reported here were obtained using iterative predictions for MGGP.

Persistence
Persistence forecasting is a common benchmark technique used to evaluate intraday solar forecasting. Persistence forecasting can be computed by Equation (11), whereĜ(t + ∆T) is the persistence forecast and ∆T is the forecast horizon, which varies from 15 to 120 min; k * t (t) is the present clear sky index; and G clr (t + ∆T) is the clear sky irradiance at the horizon of the forecast.

Error Metrics
Although there are many error metrics used in the field of solar forecasting, this study assumed the most traditional metrics in the literature: the root mean squared error (RMSE) given by Equation (12), the mean absolute error (MAE) computed by Equation (13) and the forecast skill given by Equation (14).
While MAE is a linear error metric, RMSE is a quadratic error metric that penalizes inaccurate forecast values due to quadratic factors. RMSE is of particular interest when evaluating RES forecasting since ramp behavior is a relevant issue in PV power system operation.

GP Tuning
Initial simulations were intended to analyze the influence of GP parameters in forecast accuracy and robustness. The analysis of parametric influence is known as the parameter tuning of evolutionary algorithms (EAs), as described in [49]. Parameter tuning is by nature an optimization task comprising multiple variables (parameters). In current analyses of multiple horizon forecasts, each forecast horizon at each location consists of a different problem to be tuned. In order to reduce the number of simulations to assess GP parameters, this study considered prior knowledge from other studies to seek good parameter choices to perform a lower number of simulations. Therefore, parameter assessment was carried out only for one forecast horizon using the dataset from Goiania station. Thus, parameter settings from Goiania were used in forecasting models for other sites.
Lima et al. [50] performed a systematic analysis of GP that indicated the population size, number of generations and tree size as the main parameters which influence fitness, while genetic operators have a lower influence. Increases in the size limit of regression functions tend to improve fitness; however, when the size limit is excessively large, this leads to a bloat (function size growth without fitness improvement) [51]. Bloat can be relieved by using realistic elitism rates [52]. In summary, lower tournament sizes and lower elitism rates lead to a higher diversity of solutions.
According to the literature review and some former analyses of irradiance forecasting, the maximum number of genes was set at 5, the tree depth at 4, the number of generations at 150 and population size at 300. These parameters presented a good trade-off between complexity and fitness improvement. Figure 6 presents the improvement of solution fitness in the validation dataset from Goiania station versus the increase in complexity (increasing the maximum number of genes). Genetic operators were analyzed by multiple simulations for a forecast horizon of 90 min ahead, as this is a demanding time window for prediction and consequently presents high variability in the different algorithm simulations. The results for accuracy and robustness are given in Figure 7. The number of generations was lowered to 50 during tests in order to obtain a higher variability of results. It is possible to conclude that the best accuracy and robustness (standard deviation for multiple simulations) were those accomplished using higher mutation rates, lower tournament sizes and lower elitism rates. Therefore, we selected the setting with lowest RMSE: κ = 6, p m = 0.12 and elit = 0.30.

Assessment of Exogenous Input
ANN and GP were executed for all formerly defined locations and forecast horizons both considering and neglecting weather variables Hr, Ta, Ws and pa. The error improvement index, Improv error , was defined in Equation (15) in order to assess the improvement yielded by the addition of weather variables at a given error metric, where error univ is the forecast error obtained based on past values of k * t with the sole addition of deterministic variables, and error multiv is the forecast error obtained by including weather variables. It is worth highlighting that deterministic variables are able to improve forecasts based merely on past values of k * t . Improv error = (error univ − error multiv ) error univ · 100% Improvements were calculated both in terms of MAE and RMSE, as described in Figure 8. The graphs represent typical behaviors, where weather variables generally improve forecastability for all locations by up to 5.68% in terms of MAE and 3.41% in terms of RMSE; in some locations, negative improvements were obtained for shorter forecast horizons from 15 to 60 min. Mostly, the addition of weather variables tends to improve forecastability for all locations; thus, the results obtained by the multivariate forecasts are reported.

Specific Results
Complete results for each forecast horizon and location are presented in Appendix A. The most accurate results are in bold characters for both single and ensemble forecast comparisons. Model accuracy dominance depends on the location, forecast horizon and error metric, as summarized in Figure 9. The accomplished results point toward ANN as the most accurate for short horizons and GP as the most accurate for longer horizons, which also predominantly improves robustness. Furthermore, location attributes have been proven to affect model dominance. Figure 10 presents forecast accuracies for both methods applied to the Goiania station, where the most accurate results were obtained by ANN, and Figure 11 displays the results for the Desert Rock station, where the most accurate results were obtained by GP.
Both GP and ANN methods were consistently improved considering both error metrics by employing an ensemble strategy for each forecast horizon and location. ANN presented more significant improvement and superior accuracy using the ensemble strategy in most cases, as summarized for model accuracy dominance in Figure 12 using ensemble forecasting. GP ens led to the most accurate results in eight cases out of 48, while ANN ens yielded the most accurate results in 23 cases out of 48. GP ens achieved the most accurate results for the Milan station for horizons from 15 to 45 min and from 105 to 120 min using MAE as a reference metric. At Desert Rock station, GP ens attained the lowest RMSE for horizons from 30 to 120 min. At Bondville station, GP ens accomplished the lowest RMSE for horizons from 90 to 120 min and the lowest MAE for horizons from 75 to 120 min. At PSU station, GP ens led to the lowest MAE and RMSE for horizons from 105 to 120 min. At Sioux Falls station, GP ens yielded the lowest RMSE for horizons from 105 to 120 min and the lowest MAE for horizons of 45 min and from 90 to 120 min. ANN ens has proved to be consistently effective in the forecasts carried out for Goiania weather station, as expected, because the lower variations in sunshine duration along the year lead to a less biased dataset in terms of overfitting, as night period points are excluded from the dataset during the processing stage.
From a comparison of the results obtained by Haurwitz and Ineichen for clear sky index forecasts, it is possible to conclude that Ineichen k * t persistence produces lower errors than results obtained by Haurwitz for most of the locations and horizons of prediction. Nevertheless, as the AI methods used here are improved by employing exogenous inputs, a trend of clear sky model dominance over results from GP and ANN techniques was not achieved.

Generic Results
The computation of averages based on multiple results is widely employed as a procedure to achieve reliable generalized results according to Rana et al. [17], although the use of averages does not disregard the importance of specific results. MAE and RMSE averages of all forecast horizons and locations were calculated in order to carry out a generic evaluation of accuracy for GP and ANN, and the results are presented in Figure 13. The average robustness of MAE and RMSE were similarly determined, and results are presented in Figure 14. From the generalized results, it is possible to assume that GP presents more accurate and robust forecast results in comparison to ANN for single forecasts; the ensemble strategy improves ANN forecasts more significantly than GP; the ANN ensemble generally presents the most accurate results; and both models produce similar forecastability, with little difference in terms of accuracy, indicating that GP can provide faster, more reliable and accurate predictions with lower computing complexity, while ANN can provide more accurate predictions using higher complexity and a time-demanding strategy.  A general comparison of clear sky indexes from multiple sites is exhibited in Table 4. From the analysis of results, it is possible to observe that the difference between Haurwitz k * t and Ineichen k * t forecast results is negligible, showing the low influence of the clear sky model on the accuracy of multivariate forecast results.

Regression Functions
Equation (16) presents an example of a regression function developed to forecast k * t (15), comprising a combination of the deterministic variable ω s with previous values of k * t and the weather variables T a and H r . The algorithm has been proven to be efficient in selecting suitable variables to achieve accurate and robust models with generalization ability. Selected variables to develop regressions for Goiania station are expressed in Table 5.

Comparison with the State-of-the-Art
A recent analysis of intraday solar irradiance forecasting at the SURFRAD weather stations has been carried out using regression and frequency domain models [21]. A direct comparison of the results obtained by regression, frequency domain and MGGP is presented in Table 6. Reikard et al. [21] analyzed forecasts for the same years, based on the same historical data and datasets used here. Although pieces of datasets used in each analysis are not guaranteed to be the same, a direct comparison of the results is able to ensure the suitability of the results of GP prediction.

Machine Learning Algorithm Training Speed
Training machine learning algorithms to optimize results and accuracy is normally a time-consuming task. Table 7 presents a comparison of the average training times (in minutes) assessed for Goiania station according to each forecast horizon. Similar results were obtained for the other previously mentioned stations. Although MGGP has been demonstrated to be more robust for single forecasts, the training speed of this method is lower than that for ANN. Improvements of MGGP parameter tuning strategies should be considered in future studies in order to increase the speed of MGGP training.

Conclusions
Machine learning algorithms are extensively adopted techniques for solar forecasting. This study proposed and evaluated multigene genetic programming (MGGP) as a novel machine learning algorithm, which is classified as a white box, with the aim of performing intraday solar irradiance forecasting. MGGP derives analytical regression functions that can be implemented without a specific software environment in any modern programming language using fundamental hardware. MGGP has been proven to consistently possess data generalization ability, providing robust and reliable solutions. The MGGP algorithm and another state-of-the-art MLP artificial neural network (ANN) algorithm were applied to datasets from six different locations from three different countries in order to compare results for forecast horizons from 15 to 120 min.
Data processing strategies were carefully analyzed in terms of input and output alternatives. Initial simulations were carried out for solar irradiance forecasting, using 15 minute time-windows as input data. Five minute time-window data, Haurwitz and Ineichen clear sky indexes were considered and combined with solar deterministic variables and weather variables in order to yield accurate forecast accuracies in terms of the data processing strategy.
The computation of MAE and RMSE as error metrics showed that the location, forecast horizon and error of evaluation impact the selection of the dominant model in terms of accuracy. MGGP and ANN typically yielded similar and consistent results. MGGP's utilization for single forecasts led to more accurate and robust results as opposed to ANN. Predictions were significantly improved for MGGP and ANN by adopting ensemble forecast, while the ensemble strategy improved ANN more extensively than MGGP. Regarding ensemble forecasts, MGGP was more accurate for a lower number of locations and evaluated forecast horizons in comparison to ANN, presenting the best forecast skills for Desert Rock station. MGGP predominantly accomplished more accurate prediction results for longer forecast horizons from 90 to 120 min ahead for different localities.
Based on a direct comparison with other state-of-the-art methods of forecasting applied to the same locations in USA, MGGP presented a relevant reduction in error and was proven to be a reliable and accurate approach for the analyzed localities.
The attributes of a locality have been proved to affect model dominance, showing that both MGGP and ANN can be applied to different locations. As suggestions for future investigation, studies may address hybridization strategies, ML algorithm improvement, advanced data processing strategies applied to MGGP forecasting and improvements in parameter tuning in order to enhance MGGP's training speed; furthermore, additional analyses of other solar parameters with the aim of improving the accuracy and performance of forecasting techniques could be undertaken.

Acknowledgments:
The authors thank the Federal University of Goias (UFG) through the Electrical Mechanic and Computer Engineering School (EMC) for providing access to and scientific usage of the research data. The authors also thank Espora Energetica S/A, proponent company of the R&D project related to this work, and the respective cooperative companies, Transenergia Sao Paulo S/A, Transenergia Renovavel S/A, and Caldas Novas Transmissao S/A. The authors also thank Politecnico di Milano and the National Oceanic and Atmospheric Administration for providing access to and scientific usage of their weather data.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: