Day-Ahead Photovoltaic Forecasting: A Comparison of the Most Effective Techniques

We compare the 24-hour ahead forecasting performance of two methods commonly used for the prediction of the power output of photovoltaic systems. Both methods are based on Artificial Neural Networks (ANN), which have been trained on the same dataset, thus enabling a much-needed homogeneous comparison currently lacking in the available literature. The dataset consists of an hourly series of simultaneous climatic and PV system parameters covering an entire year, and has been clustered to distinguish sunny from cloudy days and separately train the ANN. One forecasting method feeds only on the available dataset, while the other is a hybrid method as it relies upon the daily weather forecast. For sunny days, the first method shows a very good and stable prediction performance, with an almost constant Normalized Mean Absolute Error, NMAE%, in all cases (1% < NMAE% < 2%); the hybrid method shows an even better performance (NMAE% < 1%) for two of the days considered in this analysis, but overall a less stable performance (NMAE% > 2% and up to 5.3% for all the other cases). For cloudy days, the forecasting performance of both methods typically drops; the performance is rather stable for the method that does not use weather forecasts, while for the hybrid method it varies significantly for the days considered in the analysis.


Introduction
In 2017, global energy investment declined, with a fall of 2% with respect to the previous year, mainly due to lower production of coal, hydroelectric, and nuclear power [1]. On the other hand, renewables have shown unprecedented growth over the past few years because of a number of factors: climate change is now a major concern worldwide [2], air pollution in big cities (e.g. in China) has become a serious problem [3], the depletion of conventional energy resources shows that, continuing the current pattern of management, they could be largely exhausted by the end of the century [4], the cost of electricity from wind power and photovoltaics is diminishing [5], and the deployment of renewable-based power plants requires the least time among all power generation technologies [6]. In this framework, photovoltaic is the fastest-growing renewable technology and the sector with the largest investment [6].
As the amount of energy produced by PV systems is growing exponentially, the need to forecast their production is more critical than in the past, when PV plants were installed with a "fit and forget" approach. The production of solar plants is uncertain, especially due to the stochastic formation and movement of clouds in the sky. Thus, accurate forecasting models based on the use of field etc.) [14]. Auto-Regressive Moving Average (ARMA) forecasters perform well when data are stationary, while Auto-Regressive Integrated Moving Average (ARIMA) models perform well with non-stationary time series [26]. The drawback of ARIMA methods is that they are computationally more intensive than ARMA. According to the results of [16], ANNs demonstrate higher accuracy than ARMA and ARIMA and outperform other statistical methods in terms of accuracy and adaptability under uncertain meteorological conditions. However, the grouping of daily weather conditions [27][28][29][30][31] into sunny, cloudy and rainy days improves the performance of any statistical-based forecast [14,16,24,32]. This approach is mainly used for very short and short horizon applications [14,33] and represents the majority of forecast techniques currently utilized [34].
Physical methods, mainly used for applications ranging from very short to long horizons, consist of a set of mathematical equations that describe the physical state and dynamic motion of the atmosphere [35]. These methods are mainly based on Numerical Weather Prediction (NPW), Sky Imagery and Satellite Imaging [36]. They can be classified as global and mesoscale physical methods, based on the portion of the simulated atmosphere-which can be at a worldwide scale or include a limited area [37]. With reference to the forecast of the power produced by PV plants, only mesoscale models should be used; the main shortcoming of such models is that a resolution of only up to 16-50 km can be attained [38]. Another drawback of physical methods is that their performance is higher when weather conditions are stable [39] and that the accuracy is strongly affected by sharp changes in the meteorological variables [16].
Hybrid methods are a combination of any of the previous methods. The idea is to mix different models with unique features to address the limitations of individual techniques, thus enhancing the forecast performance [40][41][42]. Generally, the computational complexity increases [14]. The most common examples reported in the literature are combinations of ANN-and NPW-based models [12,[43][44][45], and also of SARIMA and SVM [46]. The performance of hybrid models depends on the performance of the single models, and these models should be specifically designed for a particular plant and location [36]. Hybrid methods where numerical weather forecasts are used together with the use of historical data of the meteorological variables typically lead to excellent forecasting [36]. However, in general, the weak point of hybrid forecast techniques is that they underperform when meteorological conditions are unstable [47]. Here [48] different Neural Network PV power output methods have been compared with long-and short-term memory (LSTM)-based models, which seem capable of recording the hidden relationships between weather parameters and actual PV power outputs from hourly patterns to seasonal patterns across days.
In general, comparison between forecast techniques is challenging as the factors influencing the performance are numerous and change for each situation: the availability of historical data and of the weather forecast, the temporal horizon and resolution, the weather conditions, the geographical location, and the installation conditions. In the case of statistical methods, appropriate data preprocessing (for example, removing the night sample when there is no power production) is fundamental, too, in order to achieve good performance and reduced computational costs [16]. The reviews available in the literature give some indications regarding the performance of the different techniques, but their findings are more qualitative than quantitative. Some recent reviews [14,16,36] present a comparative analysis using work by different authors, also including the statistical errors. However, since the conditions and metrics used in each work were different, the comparison is not meaningful from a quantitative point of view.
This work aims at comparing two of the most used and most effective methods for the forecasting of the PV production: an ANN-based method and a hybrid method. The comparison is carried out using consistent metrics and the same data from a PV plant installed in Milan, Italy. In addition, a clustering of the dataset has been performed according to the mean values of the daily solar radiation measured on the PV modules. The effect of this data clustering has been investigated on the PV output forecast.

PV Module Description
The experimental data employed in the current analysis were recorded at the SolarTechLab [49], Politecnico di Milano, the coordinates of which are latitude 45 • 30'10.588" N and longitude 9 • 9'23.677" E. During 2017, the output power of a single PV module with the following characteristics was recorded:

Performance Indexes
In order to assess the forecasting methods accuracy, some of the most common error indexes in literature [35,50,51] have been considered in this work.
The common error definition for the assessment is the hourly error e h , which is defined as: where P m,h is the average actual power in the hour and P p,h is the prediction provided by one of the forecasting methods. Starting from the hourly error definition, the other error indexes adopted for the assessment can be derived: Mean absolute error (MAE) Normalized mean absolute error, NMAE%, is MAE normalized to the net capacity of the plant C. In this analysis C is the maximum DC output power measured over the whole period and is expressed in watts: In all these definitions, N is the number of hours considered in the evaluated period (i.e., 24 h in a daily error basis calculation).
The mean absolute percentage error, MAPE%, is normalized with respect to the measured hourly power: The weighted mean absolute error, WMAE%, is based on the total energy production: The normalized root mean square error nRMSE is based on the maximum hourly power output P m,h : Moreover, two new indicators were presented in [52], and, given their ability to provide more complete information about the accuracy of the prediction, are presented here. The enveloped-weighted absolute error EMAE % : The Objective Mean Absolute Error, OMAE % , is defined as where G CS POA,i and G STC are the irradiance under clear sky conditions and the irradiance in the standard test conditions, respectively.
The scatterplot in Figure 1 shows the existing relationships among the introduced indexes, when normalized with respect to the maximum observation. The indexes shown in Figure 1 have been calculated from the data used in the present work. where , and are the irradiance under clear sky conditions and the irradiance in the standard test conditions, respectively.
The scatterplot in Figure 1 shows the existing relationships among the introduced indexes, when normalized with respect to the maximum observation. The indexes shown in Figure 1 have been calculated from the data used in the present work. From this figure, some degree of correlation among the errors is expected. For this purpose, in the present work the correlation is studied by employing the Pearson-Bravais correlation coefficient. The results are shown in Table 1. Due to the high level of correlation, a single index can be selected, as it is representative of the others. Although each index represents different characteristics of the day, generally speaking, their daily trends are similar and highly correlated.

Database Clustering
The collected hourly samples, night hours included, are used as the database for the comparison between forecasting methods. The weather forecasts used in this study are delivered by a weather service each day at 11 a.m., of the day before the forecasted one. The historical hourly database of From this figure, some degree of correlation among the errors is expected. For this purpose, in the present work the correlation is studied by employing the Pearson-Bravais correlation coefficient. The results are shown in Table 1. Due to the high level of correlation, a single index can be selected, as it is representative of the others. Although each index represents different characteristics of the day, generally speaking, their daily trends are similar and highly correlated.

Database Clustering
The collected hourly samples, night hours included, are used as the database for the comparison between forecasting methods. The weather forecasts used in this study are delivered by a weather service each day at 11 a.m., of the day before the forecasted one. The historical hourly database of these parameters is used to train the neural network and includes the following parameters: ambient temperature ( • C), global horizontal irradiation (W/m 2 ), global irradiation on the plane of the array (W/m 2 ), wind speed (m/s), wind direction ( • ), pressure (hPa), precipitation (mm), cloud cover (%), and cloud type (Low/Medium/High).
As the PV power output is strongly related to the solar irradiance [35], the available dataset has been classified in terms of the mean value of irradiance during the day. The overall available dataset for 2017 is composed of 268 days, and further divided into two sub-datasets, depending on whether the mean daily forecast irradiation on the tilted plane (G POA, f ,d ) is greater or lower than 150 W/m 2 ("Sunny days" and "Cloudy days," respectively), where: The classification considers only sunny and cloudy days. Thus, the original dataset has been split into two datasets as follows: • Sunny days: these are characterized by a mean value of the solar irradiance during 24 h greater than 150 W/m 2 (i.e., 1 Cloudy days: these are characterized by a mean value of solar irradiance in the range [5-150 W/m 2 ] (i.e., 1 Following this approach, the dataset is summarized in Table 2. The dataset subdivision was performed based on the clearness index K t as well. However, in order to take into account the seasonality, a slightly different formulation was chosen as shown in the following equation, dividing the irradiation on the plan of the array (G POA,h ) by G CSRM , the theoretical irradiation under clear sky conditions: As the PV production is 0 overnight, both for cloudy and clear days, those values are excluded in the selection phase. It was then possible to compute the daily mean K t,day . As a threshold value for K t,day we chose 0.60; hence, when the mean was above this value the day was classified as sunny, or cloudy otherwise. The outcome of this classification, both in numerosity and the days, is the same as already proposed.

Methodology
The forecasting methods are based on Feed-Forward Neural Networks (FFNNs), also named Multi-Layer Perceptron (MLP), which consist of an input, an output, and one or more hidden layers [45]. The number of neurons in the input and the output layers is set beforehand, while the number of neurons within the hidden layer is set during the training process that is performed to find a relationship between the input and output data. The most used method for the training of this type of neural network is the back-propagation algorithm. In this work we applied the same forecasting tool (i.e., MLP) to two different cases.
Case 1: Three different MLPs have been developed to forecast the power produced by the PV plant. As shown in Figure 1, a first MLP-based forecaster is used to predict the solar irradiance of the next day. If the mean value of the forecasted irradiance is greater than 150 W/m 2 , a MLP called "model 1" is used to predict the produced power. This forecaster has been specifically developed using the dataset corresponding to the 154 "sunny days." Otherwise, if the mean forecasted irradiance is smaller than 150 W/m 2 , the so-called "model 2," which was developed using the part of the dataset corresponding to the 114 "cloudy days," is used.
The input of the first MLP forecaster is the mean values of the solar irradiance G m (d) and of the air temperature T m (d) together with the number of the day d; the output can then be expressed as: where f G is the approximate function, and G 1 (d + 1), G 2 (d + 2) . . . G 24 (d + 1) are the hourly values of the forecasted solar irradiance. The structure of the MLP is sketched in Figure 2 and was developed using 240 days of the original dataset, while 28 days have been used for the testing of the network.
where fG is the approximate function, and G1 (d+1), G2 (d+2)… G24 (d+1) are the hourly values of the forecasted solar irradiance. The structure of the MLP is sketched in Figure 2 and was developed using 240 days of the original dataset, while 28 days have been used for the testing of the network.  As shown in Figure 3, the input of the second and third MLPs is the mean values of the daily solar irradiance G m (d) and of the air temperature T m (d), together with the mean produced power P m (d).
The output layer has 24 output nodes corresponding to the produced hourly power of the next day {P 1 (d + 1), P 1 (d + 2), . . . , P 1 (d + 24)}.  As shown in Figure 3, the input of the second and third MLPs is the mean values of the daily solar irradiance Gm(d) and of the air temperature Tm(d), together with the mean produced power Pm(d). The output layer has 24 output nodes corresponding to the produced hourly power of the next day {P1(d+1), P1(d+2),…, P1(d+24)}.  The output of the neural network can then be expressed as:

Gm(d)
where f P is an approximate function, and P 1 (d + 1), P 2 (d + 1) . . . P 24 (d + 1) are the forecasted values of the hourly power. The MLP called model 1 as it is shown in Figure 4 has been developed using data for the 147 sunny days, while seven days have been used for the test. Model 2 has been built using 114 cloudy days, and also tested on seven days.
where fP is an approximate function, and P1 (d+1), P2 (d+1) … P24 (d+1) are the forecasted values of the hourly power. The MLP called model 1 as it is shown in Figure 4 has been developed using data for the 147 sunny days, while seven days have been used for the test. Model 2 has been built using 114 cloudy days, and also tested on seven days. In both cases, the original data have been pre-processed as follows [53]:  In both cases, the original data have been pre-processed as follows [53]: where x ∈ [x min x max ] and y ∈ [y min y max ] are the original data and the corresponding normalized variable, respectively. ymin and ymax have been set to between {−1, 1}.

Case 2:
The implemented method to perform the simulations in this second case is still based on a FFNN but, among the inputs, the irradiation in clear sky conditions is provided. This method is called Physical Hybrid Artificial Neural Network (PHANN) and its model is shown in Figure 5. The Clear Sky Radiation Model (CSRM) adopted is described in [54] and was validated on measured data available from the SolarTechLab at the Politecnico di Milano. Since the result of a physical model is fed into the network, the artificial neural network is hybridized, as explained in [45]. The network's architecture adopted here consists of two hidden layers, including 12 and five neurons, respectively. In order to train the network, 90% of the samples are randomly assigned to the training set, while the remaining 10% are assigned to the validation set. Furthermore, we decided to use 40 trials for the ensemble logic. These characteristics of the ANN have been set in a sensitivity analysis, which was performed in a previous work [55]. To properly evaluate how sensitive the forecast is to different training methodologies and database composition, several simulations were run. In particular, three approaches are adopted according to the cluster of the day to be forecast (i.e., "Sunny" or "Cloudy"):  Ap1: All the available data are used to train the network (268 days)  Ap2: The simulations are performed using the dataset comprising all the available data but, in order to train the network, the same number of days available in the sunny and cloudy dataset is used (randomly picked)  Ap3: The simulations are performed using the "Sunny" and "Cloudy" dataset alternatively.

Results and Discussion
Case 1 As result, Figure 6 shows a comparison between the measured and forecasted hourly output power of the PV plant for both sunny and cloudy days. With reference to the sunny days, it is clearly observed that the MLP-based model for sunny days shows good accuracy (MAPE% 23.6%). The second MLP-based model, which is designed for cloudy days, does not provide good results (54.0%). To properly evaluate how sensitive the forecast is to different training methodologies and database composition, several simulations were run. In particular, three approaches are adopted according to the cluster of the day to be forecast (i.e., "Sunny" or "Cloudy"): • Ap1: All the available data are used to train the network (268 days) • Ap2: The simulations are performed using the dataset comprising all the available data but, in order to train the network, the same number of days available in the sunny and cloudy dataset is used (randomly picked) • Ap3: The simulations are performed using the "Sunny" and "Cloudy" dataset alternatively.

Case 1
As result, Figure 6 shows a comparison between the measured and forecasted hourly output power of the PV plant for both sunny and cloudy days. With reference to the sunny days, it is clearly observed that the MLP-based model for sunny days shows good accuracy (MAPE% 23.6%). The second MLP-based model, which is designed for cloudy days, does not provide good results (54.0%).

Case 1
As result, Figure 6 shows a comparison between the measured and forecasted hourly output power of the PV plant for both sunny and cloudy days. With reference to the sunny days, it is clearly observed that the MLP-based model for sunny days shows good accuracy (MAPE% 23.6%). The second MLP-based model, which is designed for cloudy days, does not provide good results (54.0%).

Case 2
In Figure 7, the results referring to six available days, selected from the sunny and cloudy day datasets, are plotted. The blue line represents the power forecast provided by the NN for the PV module, the orange line the measurements, while the yellow line is the absolute error contributed by the implemented method. The method shows good forecasting performance, especially for sunny days, as during most of the hours the forecast and the measurements overlapped and the hourly absolute error was very low. As for the sunny days, the MAPE% was 10.0%, while for the cloudy days it was around 68.9%.

Case 2
In Figure 7, the results referring to six available days, selected from the sunny and cloudy day datasets, are plotted. The blue line represents the power forecast provided by the NN for the PV module, the orange line the measurements, while the yellow line is the absolute error contributed by the implemented method. The method shows good forecasting performance, especially for sunny days, as during most of the hours the forecast and the measurements overlapped and the hourly absolute error was very low. As for the sunny days, the MAPE% was 10.0%, while for the cloudy days it was around 68.9%.

Comparison between the two cases
In Figures 8 and 9, graphical representations of NMAE% error for the two cases are provided for the first six available sunny and cloudy days. In both graphs, the blue line represents the error observed for case 1, while the orange line represents the error observed for case 2, adopting the third approach. As can be noted, for sunny days the error given by model number 1 is quite stable and almost always lower than 2%, while in the second case the error shows a peak on the 16th day and is lower than the one from the first case for days 20 and 21. With reference to cloudy days, the two models performed similarly except for the 11th day, when the error from model 2 was larger than 11%. Comparison between the two cases In Figures 8 and 9, graphical representations of NMAE% error for the two cases are provided for the first six available sunny and cloudy days. In both graphs, the blue line represents the error observed for case 1, while the orange line represents the error observed for case 2, adopting the third approach. As can be noted, for sunny days the error given by model number 1 is quite stable and almost always lower than 2%, while in the second case the error shows a peak on the 16th day and is lower than the one from the first case for days 20 and 21. With reference to cloudy days, the two models performed similarly except for the 11th day, when the error from model 2 was larger than 11%.  The forecasting performance of the two models, for the same days presented in Figures 8 and 9, is reported in terms of % in Tables 3 and 4. No clear evidence of an unequivocally better performing method can be observed. The mean WMAE% for sunny days is 10.1% and 12.4% for cases 1 and 2, respectively, while for the cloudy days the mean WMAE% is 78.1% and 151.0%, respectively.   The forecasting performance of the two models, for the same days presented in Figures 8 and 9, is reported in terms of % in Tables 3 and 4. No clear evidence of an unequivocally better performing method can be observed. The mean WMAE% for sunny days is 10.1% and 12.4% for cases 1 and 2, respectively, while for the cloudy days the mean WMAE% is 78.1% and 151.0%, respectively.   The forecasting performance of the two models, for the same days presented in Figures 8 and 9, is reported in terms of in Tables 3 and 4. No clear evidence of an unequivocally better performing method can be observed. The mean WMAE% for sunny days is 10.1% and 12.4% for cases 1 and 2, respectively, while for the cloudy days the mean WMAE% is 78.1% and 151.0%, respectively.

Conclusions
Two of the most widely used and effective forecasting ANN-based methods for the performance of PV systems have been compared. Specifically, we analyzed the 24-h-ahead power forecasting performance of the two methods, as this is arguably one of the most important kinds of predictions for the energy generation, transmission, and distribution industry.
The dataset used in this study (collected at the SolarTechLab of the Politecnico of Milan, Italy) is a historical hourly series including the climatic parameters and the simultaneous electric parameters of the photovoltaic system.
One method feeds exclusively upon data from the dataset-in this case the solar irradiance and the ambient temperature. The other method uses a hybrid approach, including as an input the daily weather forecast. In both cases, the dataset has been clustered, identifying a group of sunny days and a group of cloudy days.
The comparison is self-consistent and homogeneous, as the dataset used for the network training and testing is the same for both methods, unlike the comparisons currently available in the literature. Also, in contrast to the available literature reviews, we used the same metrics for comparing the two methods. We also compared a wide range of metrics, demonstrating a high degree of correlation among them, and therefore there was the possibility, as a first approximation, of using a subset of metrics to assess the overall performance of a method.
The results show the good forecasting performance of both methods in the case of sunny days. While the hybrid method shows an excellent performance for some specific days (NMAE% < 1% in two cases; WMAE% < 4% in three cases), the second method under study shows a more stable and consistently good performance (NMAE consistently between 1% and 2%; WMAE% oscillating between 7% and 13%).
The forecasting performance of both methods drops significantly for cloudy days. Again, while the performance of the hybrid method is better for some specific days, for at least one day the prediction is rather poor (NMAE% > 11% and WMAE% = 750), and the method that feeds exclusively on data from the dataset shows more stable performance on both NMAE and WMAE metrics.
A possible explanation for the lower performance of both methods for cloudy days is the rather high relative variability of the irradiance conditions in this cluster, making the training of the network less reliable. In fact, cloudy days are identified as days where the average irradiation is less than 150 W/m 2 , so even a small variation translates to a large relative variation. Conversely, the irradiation in sunny days is always close to its maximum value (typically near 1000 W/m 2 ) for the specific location; the variations are typically small, and even more so in relative terms. Training on this cluster is therefore expected to lead to more repeatable predictions.
As a possible explanation of the less stable performance of the hybrid model with respect to the model where the forecasts are not used, one can speculate that the forecasting ability of the first depends on the quality of the weather forecast for the specific day. Moreover, the quality of the forecast is strongly related to the location it refers to, which usually does not correspond exactly to the one where the PV system operates. A more detailed analysis considering this possible correlation is needed in order to verify this hypothesis.
This work shows that no one model outperforms the other under all possible conditions. Managers of utility-scale PV plants, manufacturers of power management systems, and distribution and transmission system operators can therefore benefit from the comparison presented in this work in order to identify the most suitable forecaster for the specific application at hand. Future work will