Calibration of GFS Solar Irradiation Forecasts: A Case Study in Romania

: Models based on Numerical Weather Prediction (NWP) are widely used for the day-ahead forecast of solar resources. This study is focused on the calibration of the hourly global solar irradiation forecasts provided by the Global Forecast System (GFS), a model from the NWP class. Since the evaluation of GFS raw forecasts sometimes shows a high degree of uncertainty (the relative error exceeding 100%), a procedure for reducing the errors is needed as a prerequisite for engineering applications. In this study, a deep analysis of the error sources in relation to the state of the atmosphere is reported. Of special note is the use of sky imagery in the identiﬁcation process. Generally, it has been found that the largest errors are determined by the underestimation of cloud coverage. For calibration, a new ensemble forecast is proposed. It combines two machine learning approaches, Support Vector Regression and Multi-Layer Perceptron. In contrast to a typical calibration, the objective function is constructed based on the absolute error instead of the traditional root mean squared error. In terms of normalized root mean squared error, the calibration roughly reduces the uncertainty in hourly global solar irradiation by 16%. The study was conducted with high-quality ground-measured data from the Solar Platform of the West University of Timisoara, Romania. To ensure high accessibility, all the parameters required to run the proposed calibration procedures are provided.


Introduction
Over the past decade, many countries have pledged to reduce their carbon footprint.The simultaneous drop in the cost of exploiting renewable energy has convinced important socio-political actors across the globe to start implementing "green" energy solutions.While producing carbon emissions during operation is largely avoided, renewable energy sources, especially wind and solar energy, present other drawbacks arising from the variability of weather.In particular, reliable photovoltaic energy forecasts [1], along with information on future load [2], are necessary for a stable and resilient electricity grid.Since the accuracy of PV power forecasting is intimately related to the accuracy of solar resource forecasting, in recent years, a significant increase in the number of solar energy forecasting studies has been observed [3].
Depending on the forecast horizon, different methods are used.For intra-hour forecasts, statistical models are frequently employed [4].For day-ahead forecasts, numerical weather prediction (NWP) models are the common practice [3].NWP models range from meso-scale models used by national meteorological companies to global models developed by a few groups such as the National Oceanic and Atmospheric Administration (NOAA) or the European Centre for Medium-Range Weather Forecasts (ECMWF).This study is focused on the Global Forecast System (GFS) issued by the National Centers for Environmental Prediction (NCEP) in the United States.GFS [5] is a free global model, widely accessible, coupled with the Global Data Assimilation System (GDAS) for initial conditions Energies 2023, 16, 4290 2 of 11 and reanalysis.GFS forecasts are issued four times daily, at 00, 06, 12 and 18 UTC, with hourly forecasts for the first 120 h, and 3-hourly forecasts for up to 16 days.GFS provides several atmospheric parameters, both instantaneous and average values.
This study aims to calibrate the hourly downward surface radiation flux (DSWRF) forecasts, provided by GFS, based on deterministic parameters.Calibration methods or post-processing methods are designed to eliminate errors that may be present in NWP forecasts by finding possible patterns.Globally, it has been found that GFS has a mean relative error of 30%, justifying the need for in-site calibration [6].
Previous studies concerned with analysis and calibration of GFS forecasts have generally found that this NWP model tends to overestimate the available solar irradiance, showing significant positive bias in the test locations.For example, in Morocco [7] and Australia [8], studies have confirmed this positive bias.In a comparative study [9] on the calibration of three NWP models (ECMWF, the mesoscale NAM and GFS) for the United States, GFS was found to perform best in clear-sky conditions, but its performance degraded significantly in cloudy conditions.That study [9], however, was limited only to intra-day performances.In an outstanding pioneering paper [10], the Model Output Statistic (MOS) method was proposed for NWP forecasts calibration.MOS focuses on the minimization of one target statistic, such as bias or mean squared error (MSE).Subsequent studies have used various regression models, namely linear or polynomial regression [11], machine learning models such as decision trees [11], or artificial neural networks [12].
In a recent study [13], we reported the calibration of the GFS hourly solar irradiation forecast by means of several statistical and machine learning methods.These methods were developed with the goal of minimizing MSE.Since MSE is quite sensitive to outliers, the calibration method improves this metric, but also introduces unnecessary uncertainties for hours with low errors (close to clear-sky conditions).
In order to avoid distorting the accurate GFS forecasts in clear sky conditions, this paper proposes a calibration approach based on methods that minimize the mean absolute error (MAE) instead of MSE.This metric is less sensitive to large outliers such as those found in the complete dataset.Very recently, it has come to light that the choice of statistical indicator is important for the improvement of the forecast.Most importantly, minimization of MSE was found to lead to underdispersed forecasts [14].
The methods applied in this study are Support Vector Regression (SVR) [15], multilayer perceptron (MLP) [16] and random forest decision trees (RF) [17].SVR is based on the support vector machine algorithm (SVM), an algorithm frequently used for classification tasks.The SVR method has the training objective of minimizing the weights (parameters) of the model and for the residuals to be less than a set value of the hyperparameter ε, effectively minimizing absolute errors.MLP algorithms are the most used feed-forward artificial neural network models.We propose a densely connected model with several hidden layers and a common activation function.MLP was found to have a good performance in post-processing NWP forecasts for solar applications, as well as a reasonable training time, and has been recommended as a calibration method in other published research [18].RF is one type of ensemble machine learning model that can be used for both classification and regression problems.It is constructed from multiple decision tree regressors that minimize a certain loss function, in our case MAE.These regressors are trained on random samples from the training dataset, and the final prediction is the average of all predictions by each decision tree regressor [17].
In summary, the objective of this research is to study the day-ahead hourly GFS forecast errors from both a statistical and an empirical approach.The empirical approach aims to characterize the state of the sky when the GFS forecasts produce large and small errors, respectively.Following the error analysis, three machine learning methods (MLP, SVR, RF), as well as two ensembles (combination of individual methods output), are proposed for the calibration of the GFS forecasts.The innovative elements reported in this study are the use of sky imagery to analyze the GFS forecasts uncertainty, the use of MAE as the objective function in GFS forecast calibration, and the proposal of a machine learning-based ensemble approach that tends to conserve the GFS forecasts under clear sky.
The rest of the paper is organized as follows: Section 2 introduces the representative dataset used in this study and the research methodology.The results are presented and discussed in Section 3. Section 4 gathers the main conclusions.The statistical measures are defined in Appendix A. Short explanatory videos are available as a Supplementary Material.

Data and Methodology
Day-ahead forecast data retrieved from the GFS website [5] for Timisoara (lat.45.7473 • N, long.21.0306 • E), Romania, between 8 September 2022 and 8 November 2022, two months with high variability in sky condition, were analyzed by comparing the day-ahead forecasts issued by GFS to the values measured in the corresponding day.Data were pre-processed to obtain the hourly global solar irradiation from the average hourly global horizontal irradiance forecasts.The ground-measured data were recorded on the Solar Platform of West University of Timisoara [19].Only data with the solar elevation angle larger than 5 • were retained.This is a current practice in solar energy modeling, aiming to eliminate the measurements close to sunrise or sunset when the pyranometers' uncertainty is high.Along with this restriction, severe outliers in the measured data with clear sky index larger than 1.1 were removed.The final dataset contains 582 entries.Each entry contains forecasted and measured hourly global solar irradiation, deterministic quantities (e.g., time stamp, solar elevation angle, extraterrestrial solar irradiation), post-processed data (e.g., clearness index, sunshine number, sunshine stability number), and estimated quantity (e.g., clear-sky solar irradiance, clear sky index).The physical quantities evaluated through a post-processing data are defined in Appendix B.
In the first step, we investigated the potential factors that may determine large errors in the GFS forecasts.The distributions of errors, depending on parameters such as the forecasted and measured clear sky index, were analyzed.The aim of this analysis was to find accessible explanatory parameters for the GFS forecasts uncertainties.The influence of the state of the sky on GFS forecasts was also considered.This was performed visually by studying images taken with an all-sky imager (ASI-16 from EKO Instruments) installed on the Solar Platform [19].ASI-16 takes images with a fish-eye camera every minute for the entire day.The investigation was conducted with the aim of detecting which states of-the sky are associated with higher errors.
With the information gained in the first step, we proceeded with the second step, in which various procedures for calibrating GFS forecasts were developed and tested.Since solar irradiance is not linear in time, the calibration was done on the clear sky index, rather than on the solar irradiation.The clear sky index was computed using hourly integrated clear-sky solar irradiance retrieved from Long and Ackermann's model [20]: where h is the solar elevation angle.
For the calibration process, we considered four parameters: the forecasted clear sky index, the forecasted clearness index, lead time and the solar elevation angle (corresponding to the average value of sin(h)).The last two parameters are deterministic, requiring only the knowledge of the geographical coordinates and the time for which the forecast is issued.
The entire dataset was split into two datasets: the training dataset and the test dataset, with an 80% to 20% split.After many preliminary tests performed against the training dataset, we retained three procedures from the machine learning (ML) class as having the best performance in calibrating GFS forecasts.These procedures were: Support Vector Regression (SVR), multi-layer perceptron (MLP), and random forest decision trees (RF).Since the procedures are all from the ML class, tuning the hyperparameters of each model is warranted.For SVR we used a radial basis function kernel, an error threshold (epsilon value) of 0.12 and a margin limitation value C = 100.In the case of the MLP, we found that five hidden layers were sufficient, and that adding more layers did not improve the metrics of the forecast.Each hidden layer had 128 neurons, and we used the popular ADAM optimizer.In the training step, we selected 100 epochs and a batch size of 64.The RF model had the following hyperparameters: maximum depth of decision trees equal to 5 and number of estimators equal to 700.All models were implemented using popular open-source libraries in Python, such as pandas for data processing, scikit-learn for the regression models and Tensorflow for the MLP method.
In addition to these three models, we constructed two ensembles.The first, Ensemble 1, took the average between the forecasted values of SVR and MLP calibration.Ensemble 2 took the average of all three models.
Finally, the calibrated forecasts were evaluated on the test dataset.

Results and Discussion
This section is divided into two parts.In the first part, the potential sources of uncertainty in the GFS forecasts are analyzed from the perspective of calibration.The aim of this analysis is to identify accessible atmospheric parameters that can play the role of independent variables in the calibration equation.A special facet of the analysis is the discussion on the connection between the state of the sky at the time for which the forecast is issued (video illustrated in Supplementary Material) and the forecast deviation from the measured value.In the second part of this section the calibration effect is evaluated on the basis of several statistical measures defined in Appendix A.

Sources of Errors
Figure 1 shows the distribution of the relative errors in the entire dataset of uncalibrated GFS forecasts, with most errors being less than 100%.Visual inspection shows that the uncalibrated GFS forecast registered relative errors of less than 50% for most of the dataset.However, about 15% of the forecasted data experiences a relative error higher than 100%.As already stated in Section 2, a first aim of this study is to identify possible events that lead to GFS forecasts vastly different from the actual solar irradiation values.For this, the subset including forecasts facing large errors was compared with a different subset of comparable size (15% of the entire dataset), containing only forecasts with errors lower than 5%.
Since the procedures are all from the ML class, tuning the hyperparameters of each model is warranted.For SVR we used a radial basis function kernel, an error threshold (epsilon value) of 0.12 and a margin limitation value C = 100.In the case of the MLP, we found that five hidden layers were sufficient, and that adding more layers did not improve the metrics of the forecast.Each hidden layer had 128 neurons, and we used the popular ADAM optimizer.In the training step, we selected 100 epochs and a batch size of 64.The RF model had the following hyperparameters: maximum depth of decision trees equal to 5 and number of estimators equal to 700.All models were implemented using popular opensource libraries in Python, such as pandas for data processing, scikit-learn for the regression models and Tensorflow for the MLP method.
In addition to these three models, we constructed two ensembles.The first, Ensemble 1, took the average between the forecasted values of SVR and MLP calibration.Ensemble 2 took the average of all three models.
Finally, the calibrated forecasts were evaluated on the test dataset.

Results and Discussions
This section is divided into two parts.In the first part, the potential sources of uncertainty in the GFS forecasts are analyzed from the perspective of calibration.The aim of this analysis is to identify accessible atmospheric parameters that can play the role of independent variables in the calibration equation.A special facet of the analysis is the discussion on the connection between the state of the sky at the time for which the forecast is issued (video illustrated in Supplementary Material) and the forecast deviation from the measured value.In the second part of this section the calibration effect is evaluated on the basis of several statistical measures defined in Appendix A.

Sources of Errors
Figure 1 shows the distribution of the relative errors in the entire dataset of uncalibrated GFS forecasts, with most errors being less than 100%.Visual inspection shows that the uncalibrated GFS forecast registered relative errors of less than 50% for most of the dataset.However, about 15% of the forecasted data experiences a relative error higher than 100%.As already stated in Section 2, a first aim of this study is to identify possible events that lead to GFS forecasts vastly different from the actual solar irradiation values.For this, the subset including forecasts facing large errors was compared with a different subset of comparable size (15% of the entire dataset), containing only forecasts with errors lower than 5%.In order to minimize the errors of the GFS forecast, it is essential to understand the difference between the two atmospheric states: when accurate forecasts are issued vs.In order to minimize the errors of the GFS forecast, it is essential to understand the difference between the two atmospheric states: when accurate forecasts are issued vs. when the forecasts produce large errors.Since a calibration of the GFS solar irradiation forecasts is conditioned by the parameters' accessibility, to develop a calibration equation, we analyzed only the deterministic (sun elevation angle) and the forecasted parameters (clear sky index, clearness index, cloud cover).
Energies 2023, 16, 4290 5 of 11 However, for a better understanding of the parameters' influence on the forecasts' accuracy, four parameters measured at the moment for which the forecasts were issued (clear sky index, clearness index, relative sunshine, sunshine stability number) were also analyzed.We stress that these parameters cannot be considered for building a calibration equation, since they are always obtained after a forecast is issued.Figure 2 shows the histograms of the eight parameters for both the large error dataset and the small error dataset, respectively.Comparing the histograms in each frame, the following can be noticed: there are more erroneous forecasts for circumstances when the solar elevation angle is lower than 30 • (Figure 2a).For a higher solar elevation angle, the forecasts' accuracy increases.The most accurate forecasts were issued for roughly 30 • < h < 40 • .The GFS forecasted clear sky index (Figure 2b) and clearness index (Figure 2c) do not show a significant difference between the low and high error datasets.Conversely, the inaccessible measured values of the two indexes (Figure 2f,g) show a clear separation between the errors.It can be concluded that the large errors occur during days when the clear sky index is overestimated.From its definition, it is obvious that the clear sky index is inversely proportional to cloud cover.Therefore, failure to accurately forecast overcast hours leads to high forecasted clear sky index values.In other words, the large errors are due to an underestimation of cloud coverage.The same clear separation is indicated by the measured relative sunshine (Figure 2h), with low errors corresponding mostly to sunny hours (σ = 1) and large errors mostly to overcast hours (σ = 0).This is well correlated with the separation between errors observed in the measured clear sky index and clearness index histograms.Surprisingly, the variability in the state of the sky during each hour does not much influence the forecast error (Figure 2e).The variability in the state of the sky was quantified by the average sunshine stability number SSSN (see Appendix B for a definition).
when the forecasts produce large errors.Since a calibration of the GFS solar irradiation forecasts is conditioned by the parameters' accessibility, to develop a calibration equation, we analyzed only the deterministic (sun elevation angle) and the forecasted parameters (clear sky index, clearness index, cloud cover).
However, for a better understanding of the parameters' influence on the forecasts' accuracy, four parameters measured at the moment for which the forecasts were issued (clear sky index, clearness index, relative sunshine, sunshine stability number) were also analyzed.We stress that these parameters cannot be considered for building a calibration equation, since they are always obtained after a forecast is issued.Figure 2 shows the histograms of the eight parameters for both the large error dataset and the small error dataset, respectively.Comparing the histograms in each frame, the following can be noticed: there are more erroneous forecasts for circumstances when the solar elevation angle is lower than 30° (Figure 2a).For a higher solar elevation angle, the forecasts' accuracy increases.The most accurate forecasts were issued for roughly 30 40 h     .The GFS forecasted clear sky index (Figure 2b) and clearness index (Figure 2c) do not show a significant difference between the low and high error datasets.Conversely, the inaccessible measured values of the two indexes (Figure 2f,g) show a clear separation between the errors.It can be concluded that the large errors occur during days when the clear sky index is overestimated.From its definition, it is obvious that the clear sky index is inversely proportional to cloud cover.Therefore, failure to accurately forecast overcast hours leads to high forecasted clear sky index values.In other words, the large errors are due to an underestimation of cloud coverage.The same clear separation is indicated by the measured relative sunshine (Figure 2h), with low errors corresponding mostly to sunny hours (σ = 1) and large errors mostly to overcast hours (σ = 0).This is well correlated with the separation between errors observed in the measured clear sky index and clearness index histograms.Surprisingly, the variability in the state of the sky during each hour does not much influence the forecast error (Figure 2e).The variability in the state of the sky was quantified by the average sunshine stability number SSSN (see Appendix B for a definition).The above conclusions are also visible in the two-dimensional histograms presented in Figure 3.A clear distinction can be seen between the relative errors distribution with Energies 2023, 16, 4290 6 of 11 respect to measured (Figure 3b) and forecasted (Figure 3c) clear sky index.The large errors are distributed at low measured values of clear sky index, while these same errors are distributed over significantly higher forecasted values of clear sky index.However, there is a high concentration of points with low errors in the region with forecasted and measured clear sky index close to 1.This corresponds to periods of clear sky, confirming that GFS performs well under such conditions.In terms of the solar elevation angle, low errors are concentrated mostly above 30 • ; however, large errors also occur at these angles.(h) measured relative sunshine measured.
The above conclusions are also visible in the two-dimensional histograms presented in Figure 3.A clear distinction can be seen between the relative errors distribution with respect to measured (Figure 3b) and forecasted (Figure 3c) clear sky index.The large errors are distributed at low measured values of clear sky index, while these same errors are distributed over significantly higher forecasted values of clear sky index.However, there is a high concentration of points with low errors in the region with forecasted and measured clear sky index close to 1.This corresponds to periods of clear sky, confirming that GFS performs well under such conditions.In terms of the solar elevation angle, low errors are concentrated mostly above 30°; however, large errors also occur at these angles.Complementary to the previous statistical analysis on measured data, an analysis on the state of the sky was also carried out.The analysis was performed observationally, comparing the state of the sky in the two extremes, very high vs. very low accuracies, in GFS forecasts.The state of the sky was monitored with the all-sky imager ASI-16, at a temporal resolution of 1 min.Each hour in both datasets, the large-error dataset and the small-error dataset, has 60 photos of the state of the sky associated with it.Figure 4 displays eight fish-eye photos, where every photo was extracted from a set of 60 photos associated with an hour.Figure 4a-d display four snapshoots taken in four different hours from the small error dataset, while Figure 4e-h display four snapshoots taken in four different hours from the large error dataset.The detailed evolution of the state of the sky in each of the eight hours is included in the Supplementary Materials associated with this article.This includes eight videos of 12 s each.Each video was built from 60 photos taken in an hour.At first glance, it can be definitely seen that the small errors in GFS forecasts are predominantly related to low-cloudiness conditions, while the large errors are predominantly related to overcast conditions.
To illustrate some datapoints from the two datasets, we present in Figure 4 the state of the sky at those given hours.Visual inspection reveals that the large-error datapoints occurred mostly during severely overcast time periods, while lower errors were generated during periods with close to clear-sky atmospheric conditions.There are some moments when GFS correctly forecasted cloud cover in the low-error dataset; however, there are more cases of incorrect cloud-cover forecasts leading to overestimation of the solar irradiation in the large-error dataset.Figure 4a,b showcase situations with conditions close to clear sky.During those hours, the presence of thin high-altitude clouds is visible.In the hour represented by Figure 4c, there is significant cloud coverage, albeit with low optical depth.There are also short periods of cloud enhancement during the recorded hour.During the interval corresponding to Figure 4d, cloud cover reached values of 100%, being correctly predicted by the GFS forecast.For the hours with large uncertainties, shown in Complementary to the previous statistical analysis on measured data, an analysis on the state of the sky was also carried out.The analysis was performed observationally, comparing the state of the sky in the two extremes, very high vs. very low accuracies, in GFS forecasts.The state of the sky was monitored with the all-sky imager ASI-16, at a temporal resolution of 1 min.Each hour in both datasets, the large-error dataset and the small-error dataset, has 60 photos of the state of the sky associated with it.Figure 4 displays eight fish-eye photos, where every photo was extracted from a set of 60 photos associated with an hour.Figure 4a-d display four snapshoots taken in four different hours from the small error dataset, while Figure 4e-h display four snapshoots taken in four different hours from the large error dataset.The detailed evolution of the state of the sky in each of the eight hours is included in the Supplementary Materials associated with this article.This includes eight videos of 12 s each.Each video was built from 60 photos taken in an hour.At first glance, it can be definitely seen that the small errors in GFS forecasts are predominantly related to low-cloudiness conditions, while the large errors are predominantly related to overcast conditions.
To illustrate some datapoints from the two datasets, we present in Figure 4 the state of the sky at those given hours.Visual inspection reveals that the large-error datapoints occurred mostly during severely overcast time periods, while lower errors were generated during periods with close to clear-sky atmospheric conditions.There are some moments when GFS correctly forecasted cloud cover in the low-error dataset; however, there are more cases of incorrect cloud-cover forecasts leading to overestimation of the solar irradiation in the large-error dataset.

Validation of the Calibration Procedure
Based on the previous analysis and the results of various statistical and AI procedures tested on the training dataset, the following three ML procedures were selected: Support Vector Regression (SVR), multi-layer perceptron (MLP) and random forest decision trees (RF).Two ensembles were also built: Ensemble 1 as the average of SVR-and MLP-calibrated forecasts, and Ensemble 2 as the average of all three calibrated forecasts.
Table 1 presents the models' performances against the test dataset.The uncalibrated GFS forecast performance is also listed.All calibrations result in a reasonable improvement in nRMSE and nMBE.Despite some improvements in R 2 , the determination coefficient still remains low.Ensemble 1 gives the best overall results.Ensemble 1 decreases nRMSE by 6.54%, which means a relative increase in performance of 16.5% on the test data.A significant decrease in nMBE from 5.5% to 0.5% is also noticed, similar to another study focused on intra-day forecast calibration [9].In fact, all of the calibrating procedures remove the bias.It is worth noting that the calibrations were technically focused on the minimization of MAE.As a consequence, the MAPE decrease is not significant.MAPE reductions did not exceed 2%, even for Ensemble 1.This can be explained by the MAPE definition (Equation (A6)): each error is divided individually by the true value.Therefore, MAPE is skewed, with high errors being counted during hours with low solar irradiation, which significantly impacts MAPE.In other words, optimizing MAPE will result in a strange forecast that will most likely underestimate the measurements.
Adding the RF model to Ensemble 1, resulting in Ensemble 2, did not improve any metric compared to Ensemble 1.This could be expected from the metrics of the base RF model, which performed worst out of all the models for all metrics but nMBE.However, since we performed a grid search for the hyperparameter tuning, further research could improve the performance of the RF procedure by expanding the grid or by using more advanced decision tree models.

Validation of the Calibration Procedure
Based on the previous analysis and the results of various statistical and AI procedures tested on the training dataset, the following three ML procedures were selected: Support Vector Regression (SVR), multi-layer perceptron (MLP) and random forest decision trees (RF).Two ensembles were also built: Ensemble 1 as the average of SVR-and MLP-calibrated forecasts, and Ensemble 2 as the average of all three calibrated forecasts.
Table 1 presents the models' performances against the test dataset.The uncalibrated GFS forecast performance is also listed.All calibrations result in a reasonable improvement in nRMSE and nMBE.Despite some improvements in R 2 , the determination coefficient still remains low.Ensemble 1 gives the best overall results.Ensemble 1 decreases nRMSE by 6.54%, which means a relative increase in performance of 16.5% on the test data.A significant decrease in nMBE from 5.5% to 0.5% is also noticed, similar to another study focused on intra-day forecast calibration [9].In fact, all of the calibrating procedures remove the bias.It is worth noting that the calibrations were technically focused on the minimization of MAE.As a consequence, the MAPE decrease is not significant.MAPE reductions did not exceed 2%, even for Ensemble 1.This can be explained by the MAPE definition (Equation (A6)): each error is divided individually by the true value.Therefore, MAPE is skewed, with high errors being counted during hours with low solar irradiation, which significantly impacts MAPE.In other words, optimizing MAPE will result in a strange forecast that will most likely underestimate the measurements.Adding the RF model to Ensemble 1, resulting in Ensemble 2, did not improve any metric compared to Ensemble 1.This could be expected from the metrics of the base RF model, which performed worst out of all the models for all metrics but nMBE.However, since we performed a grid search for the hyperparameter tuning, further research could improve the performance of the RF procedure by expanding the grid or by using more advanced decision tree models.

Conclusions
Deterministic day-ahead forecast of solar resource is commonly based on NWP models.The GFS day-ahead forecast of hourly global solar irradiation was analyzed over a period of two months in Timisoara, Romania, by comparing the forecasted values with the measured ones.Most of the identified errors were below 50%, but errors above 100% were noticed for about 15% of the entire period.Both statistical characterization of data and visual sky imagery were used to identify possible sources of errors.It was found that the largest errors were caused by underestimation of cloud coverage.Based on this primary analysis, two deterministic parameters (solar elevation angle and lead time), and two forecasted parameters (clear sky index and clearness index) were used to calibrate the forecast.In order to reduce the overall positive bias in GFS hourly solar irradiation forecast due to cloud coverage mischaracterization, three calibration models were proposed.Preliminary tests selected the models from the machine learning class: Support Vector Regression (SVR), Multi-Layer Perceptron (MLP) and Random Forest Decision Trees (RF).The objective function used for optimization was a MAE minimizing function, and three machine learning models were able to accommodate this.Additionally, two ensembles were tested: Ensemble 1 as the mean of SVR and MLP forecasts and Ensemble 2 as the mean of all three models' forecasts.Generally, the proposed calibrations improve the GFS day-ahead hourly global solar irradiation forecast.Ensemble 1 performed best, with a relative improvement in the forecast accuracy of 16.5% in terms of nRMSE.Aiming to ensure a higher accessibility to potential users, all the parameters required to apply the proposed calibration procedures, are provided.The parameters represent the starting point in developing customized models for the calibration of GFS day-ahead hourly global solar irradiation forecasts.

Figure 1 .
Figure 1.Distribution of relative errors for the GFS forecast.

Figure 1 .
Figure 1.Distribution of relative errors for the GFS forecast.

Figure 2 .
Figure 2. Histograms of eight parameters in the large-error dataset (blue) and in the small-error dataset (red).The parameters are: (a) the solar elevation angle h; (b) forecasted clear sky index kcs,GFS; (c) forecasted clearness index kt-,GFS; (d) forecasted cloud cover; (e) measured sunshine stability

Figure 2 .
Figure 2. Histograms of eight parameters in the large-error dataset (blue) and in the small-error dataset (red).The parameters are: (a) the solar elevation angle h; (b) forecasted clear sky index k cs,GFS ; (c) forecasted clearness index k t-,GFS ; (d) forecasted cloud cover; (e) measured sunshine stability number SSSN measured ; (f) measured clear sky index k cs,measured ; (g) measured clearness index k t,measured ; (h) measured relative sunshine σ measured .

Figure 3 .
Figure 3. 2D histograms of GFS forecast errors depending on (a) solar elevation angle h; (b) forecasted clear sky index kcs,GFS; (c) measured clear sky index kcs,measured.The color hues quantify the frequency in each class.

Figure 3 .
Figure 3. 2D histograms of GFS forecast errors depending on (a) solar elevation angle h; (b) forecasted clear sky index k cs,GFS ; (c) measured clear sky index k cs,measured .The color hues quantify the frequency in each class.
Figure 4a,b showcase situations with conditions close to clear sky.During those hours, the presence of thin high-altitude clouds is visible.In the hour represented by Figure 4c, there is significant cloud coverage, albeit with low optical depth.There are also short periods of cloud enhancement during the recorded hour.During the interval corresponding to Figure 4d, cloud cover reached values of 100%, being correctly predicted by the GFS forecast.For the hours with large uncertainties, shown in Figure 4e-h, severe overcast conditions are the defining feature.Rain also occurred during the interval corresponding to Figure 4h.

Figure
Figure 4e-h, severe overcast conditions are the defining feature.Rain also occurred during the interval corresponding to Figure 4h.

Figure 4 .
Figure 4.All sky imager pictures displaying the state of the sky in hours with (a-d) small uncertainty in GFS forecasts and (e-h) large uncertainty in GFS forecasts.

Figure 4 .
Figure 4.All sky imager pictures displaying the state of the sky in hours with (a-d) small uncertainty in GFS forecasts and (e-h) large uncertainty in GFS forecasts.

Table 1 .
Statistical indicators of accuracy for the original GFS and the calibrated forecasts.