Comparison of Training Approaches for Photovoltaic Forecasts by Means of Machine Learning

: The relevance of forecasting in renewable energy sources (RES) applications is increasing, due to their intrinsic variability. In recent years, several machine learning and hybrid techniques have been employed to perform day-ahead photovoltaic (PV) output power forecasts. In this paper, the authors present a comparison of the artiﬁcial neural network’s main characteristics used in a hybrid method, focusing in particular on the training approach. In particular, the inﬂuence of different data-set composition affecting the forecast outcome have been inspected by increasing the training dataset size and by varying the training and validation shares, in order to assess the most effective training method of this machine learning approach, based on commonly used and a newly-deﬁned performance indexes for the prediction error. The results will be validated over a one-year time range of experimentally measured data. Novel error metrics are proposed and compared with traditional ones, showing the best approach for the different cases of either a newly deployed PV plant or an already-existing PV facility.


Introduction
In recent years, several forecasting methods have been developed for the output power of renewable energy sources (RES) [1], addressing in particular the intrinsic variability of parameters related to changing weather conditions, which directly affect the photovoltaic (PV) systems' power output [2].This increasing attention is mainly due to the increasing shares of RES quota in power systems, which involve novel technical challenges for the efficiency of the electrical grid [3].In particular, predictive tools based on historical data can generally provide advantages in PV plant operation [4,5], reduce excess production, and take advantage of incentives for RES production [6].
Among the commonly-used forecasting models, most aim to predict the expected power production based on numerical weather prediction (NWP) systems forecasts [7].This is a complex problem with high degrees of non-linearity; for this reason, it is commonly approached by means of advanced models and techniques-i.e., evolutionary computation [8], machine learning (ML) [9], and artificial neural networks (ANNs) [10].These are pseudo-stochastic iterative approaches defined in the class of computational intelligence techniques, and are usually employed to address pattern recognition, function approximation, control, and forecasting problems [11].Moreover, they are generally able to handle incomplete or missing data and solve problems with a high degree of complexity.
Recently, several ANN layouts have been developed to solve different tasks [12], such as: times series prediction, complex dynamical system emulation [13], speech generation, handwritten digit recognition, and image compression, due to their ability to learn from extended time series of historical measurements with acceptable error levels compared to other statistical and physical forecasting models [14].Currently, ANN employment in forecasting is quite straightforward due to the widespread development of specific software applications [15][16][17].
In particular, the first attempts at solar power forecasting by means of ANN started more than a decade ago [18].Generally, in the case of PV power output, common training data are the historical measurements of power production from a PV facility and meteorological parameters unique to the facility location, including temperature, global horizontal irradiance (i.e., the intensity of all the solar radiation components on a horizontal surface) [19], and cloud cover above the facility.Additional forecasted variables from the numerical weather predictions can also be considered, such as wind speed, humidity, pressure, etc. [20].
Novel forecasting models were recently implemented by adding an estimate of the clear sky radiation to the series of historical local weather data, as reported in [21].
Additionally, the effectiveness of ensemble methods was demonstrated in [22], thus giving additional advantages in terms of results reliability and the implementation of efficient parallel computing techniques.
In their previous work [23], the authors conducted a detailed analysis to find a procedure for the best ANN layout and settings in terms of the number of layers, neurons, and trials for the PV day-ahead forecast.Furthermore, evidence showed that the forecasting performance of ML techniques is affected by the composition of the training data-set, as well as by input selection [24,25].
In this paper, a specific study is conducted on training data-sets in order to provide a more detailed analysis of the effect of different approaches in the training data-set composition on the day-ahead forecast of the PV power production.In particular, the authors present some procedures to set-up the training and validation data-sets for the ANN used in physical hybrid method to perform the day-ahead PV power forecast in view of the electricity market.Moreover, a novel error metric is proposed and compared with traditional ones, in order to validate the best training approach in different cases: indeed, the procedures outlined herein can be adopted to set-up data-sets based on either historical data retrieved from an existing PV plant or on incremental data measurements in a newly deployed PV plant.The test data set will be made up of the 24-hourly PV power values forecasted one day-ahead.
The paper is structured as follows: Section 2 provides an overview of the considered approaches for the composition of the training database, considering both cases of historical data retrieved from an existing PV plant and incremental data measurements in a newly deployed PV plant; Section 3 presents the methodology implemented to compare the different training approaches presented here, proposing some new metrics aimed at evaluating the suitability of the proposed configurations in terms of error performance and statistical behavior; Section 4 presents the considered case study, which is used to test the proposed training approaches: specific simulations and numerical results are provided in Section 5, and final remarks are reported in Section 6.

Training Database Composition Approaches
In order to perform the day-ahead forecast, the ANN needs to be trained.Hence, the amount of historical data employed in the supervised learning determines the ANN forecast capability.This amount of data is formed of samples exploited in the process of identifying the links among neurons in the network which minimize the error in the forecast.In order to do this task, the whole amount of available samples is divided in two groups:

•
the "training set" (or equally "training database"), which is used to adjust the weights among neurons by performing the forecast on the same samples, • the "validation set", which is used as a stopping criteria to avoid over-fitting and under-fitting.It proves the goodness of the trained network on additional samples which have not been previously included in the training set.The purpose of this step is to test the generalization capability of the neural network on a new data-set .
Learning occurs by updating elements within the network; thus, its response iteratively improves to match the desired output.An ANN is trained when it has learned its task and converges to a solution.To achieve this, some learning algorithms are commonly used: evolutionary algorithms (genetic algorithms, particle swarm optimization, etc.) Sometimes, according to the problem, the fastest algorithm gives solutions rapidly converging on local minima; however, this does not guarantee the maximum accuracy.In addition, it should be considered that a large training set size provides a better sample of the trends improving generalization, but it generally slows down the learning process.If an ANN is not properly trained or sized, there are usually undesired results, such as "overfitting" and "underfitting" [26].Using ANN ensembles by averaging their outputs has been demonstrated to be beneficial, as it helps to avoid chance correlations and the overtraining problem [27,28].
However, to choose both the most suitable learning algorithm and the proper size of the training set which minimizes the error is a challenge which should be faced in each case study [29][30][31].
In this paper, we inspect how the behavior on the day-ahead forecast is influenced by the possible characteristics summarized in Figure 1.The first characteristic of the data-set is either "incremental" when the elements belonging to the training data-set are progressively available over time and the training set size gradually increases or "complete" if an already existing database of samples is available.The second characteristic refers to the way the data-set is used for training the ANN.As the forecast-making is mainly a stochastic process, the choice could be to use entirely the same training data set for each forecast of the ensemble (we refer to the single forecast with the term "trial", and in this case, all the trials will be the same in the ensemble) or to shuffle its elements, grouping them in smaller subsets adopted each time to separately train a different ANN (in this other case, each trial is independent, as all the training data-sets are different).Finally, the mean of the resulting output is usually calculated in the so-called "ensemble" forecast.The third characteristic is related to the order of the hourly samples that constitute the training data-set.They can appear either consecutively displaced as the chronological time series they belong to or they can be randomly grouped and mixed up.All of the assumptions exposed here are valid, in general terms, for all ANN-based methods.In this specific paper, authors employ the Physical Hybrid Artificial Neural Network (PHANN) method for the day-ahead forecast, as described in detail in [14,21].This procedure mixes the physical Clear Sky Radiation Model (CSRM) and the stochastic ANN method as reported in Figure 2.

Incremental Training Data-Set
An incremental data-set occurs when the available samples are limited.Usually this is the case of real-time or time-dependant processes, and data can be acquired only progressively.Consider for example our case study when the monitoring system starts recording data from the first day of operation of the PV plant: initially a small amount of data is recorded, and if we acquire hourly samples, 24 samples are added to the historical data-set every day.
In this database composition (e.g., see Figure 3), the days which can be employed for ANN training are those available starting from the PV plant commissioning (day 1) until the k d day before the forecast (day X d ).As a consequence, the size of the training database will increase over time.In order to supply the data-set to the network for the training step, samples can be arranged in different methods.Those adopted in this paper are listed in Figure 4, and determine different results in the forecast.A short description is given in the following:   In the first two methods (A and A*), the effect of the proximity of the training set to the forecast day is examined (implying seasonal variations on the parameter), inspecting how the forecast is affected by the proximity of the samples employed in the training rather than in the validation step.For example, it is clear that forecasting spring days cannot be accurate if the training samples belong to the past autumn or winter, and the same consideration applies for the validation.Reasonably, we are expecting that the further the samples of the validation are, the less accurate the forecast.Obviously, this problem is not addressed in Method B, as samples are randomly chosen.

Complete Training Data-Set
In the complete data-set, an extended amount of samples is available, but it might belong to a period of time which is time-wise distant from the days of the forecast, as it is shown in Figure 5.In this case, samples which have to be employed for the ANN training can either be mixed (as shown in Figure 6) each time that a trial is performed (this happens when trials are independent with Method C1), or each trial depends on the same training data-set with Method C2.The complete list of the training methods which have been adopted in this paper is in Table 1.The different shares of the training and the validation set, 90% and 10%, respectively, have been set up in previous works.

Evaluation Indexes
The effect of the different methods of training is investigated by means of some evaluation indexes.These indexes aim at assessing the accuracy of the forecasts and the related error, and it is therefore necessary to define the indexes.There is a wide variety of existing definitions of the forecasting performance, and technical papers present many of these indexes; hence, we will report some of the most commonly used definitions in the literature ( [32][33][34]).
The hourly error e h is the starting definition given as the difference between the hourly mean values of the power measured in the h-th hour P m,h and the forecast P p,h provided by the adopted model [32,35]: From the hourly error expression and its absolute value |e h |, other definitions can be inferred; i.e., the well-known mean absolute percentage error (MAPE): where N represents the number of samples (hours) considered: usually it is calculated for a single day, month, or year.
Since the hourly measured power P m,h significantly changes during the same day (i.e., sunrise, noon, and sunset), for the sake of a fair comparison, in this paper the authors preferred to consider the normalized mean absolute error N MAE % : where the percentage of the absolute error is referred to the rated power C of the plant, in place of the hourly measured power P m,h .
In this paper we also adopted the mean value of all the N MAE %,d , which refers to the d-th day, calculated over the whole period.Therefore, we introduce N MAE % , which is the mean of all the daily N MAE %,d obtained with a given data-set: The weighted mean absolute error W MAE % is based on total energy production: The normalized root mean square error nRMSE is based on the maximum hourly power output P m,h : This error definition is the well-known root mean square error (RMSE) which has been normalized over the maximum hourly power output P m,h measured in the considered time range, for the sake of a fair comparison.
N MAE % is largely used to evaluate the accuracy of predictions and trend estimations.In fact, often relative errors are large because they are divided by small power values (for instance the low values associated to sunset and sunrise): in such cases, W MAE % could result very large and biased, while N MAE % , by weighting these values with the capacity of the plant C, is more useful.
The nRMSE % measures the mean magnitude of the absolute hourly errors e h,abs .In fact, it gives a relatively higher weight to larger errors, thus allowing particularly undesirable results to be emphasized.In fact, if we consider the daily trends of the aforementioned indexes (which are shown in Figure 7), it can be seen how they are correlated, while in the same Figure 8, the scatterplot of their normalized values with the relative maxima clearly shows these correlations between the three error indexes.Furthermore, the Pearson-Bravais correlation index ρ xy [36] has been calculated to underline the direct relationship among the error indexes: However, as it is shown in Figure 7, the daily evaluation indexes expressed in Equations ( 3), ( 5), and (6) could vary a great deal, being unable to give complete information "at a glimpse" of the accuracy of the prediction.For example, consider Figures 9 and 10, where the forecasts and the relevant evaluation indexes for 1 April and 4 November 2014, respectively, are depicted.In both cases, daily N MAE % values are quite low (around 2-3%) and a forecast assessment solely based on this basis could be misleading.
Actually, the 1 April was quite a sunny day and the bell-shaped hourly power curve which has been forecast-the red starred line-was accurately following the measured one-the blue circled line.The cloudy winter day 4 November 2014 was a different story; in fact, the forecast red curve is biased on the noon hours, while the actual blue curve in the morning.However, in the second day, the daily N MAE % value is lower.This is owing to the normalisation of the mean absolute error with the net capacity of the plant.Regarding the other evaluation indexes, even if they are correlated, they can exceed the 100% cap, as happens for example to W MAE % in Figure 7 on day 72.Starting from these assumptions, and in view of a more useful summary evaluation, an additional performance index is proposed, aiming to provide a value between 0% and 100% of the forecast accuracy.Therefore the envelope-weighted mean absolute error, EMAE % is defined as: where the numerator is the sum of the absolute hourly errors, as in W MAE % , while the denominator is the sum of the maximum between the forecast and the measured hourly power.In particular, this definition is consistent with a graphical representation of the error, where the numerator corresponds to the yellow area shown in Figures 9 and 10 and the denominator is the sum of the gray and yellow areas highlighted in the same figures.With reference to the above-mentioned days, while the two N MAE % values are nearly the same, the EMAE % is 11% in the first case and 40% in the second case, and it never exceeds 100%.
As with the daily N MAE %,d , in this study we also introduced the mean value of all the EMAE %,d , which are referred to the d-th day, calculated over the whole period.Therefore, EMAE % is the mean of all the daily EMAE %,d for a given data-set:

Case Study
Experimental data for this study were taken from the laboratory SolarTechLab [37] located in Milano, Italy (coordinates: 45 • 30 10.588 N; 9 • 9 23.677E).In 2014, the DC output power of a single PV module with the following characteristics was recorded: The monitoring activity of the PV system parameters lasted from 8 February to 14 December 2014, but the employable data, without interruptions and discontinuities, amount to 216 days.These 24-hourly samples were used as the database for the forecasting methods comparison.
The PV module was linked to the electric grid by a micro-inverter ABB MICRO-0.25-I-OUTD[38], guaranteeing the optimization of the production.Its operating parameters-DC power included-were transmitted to a workstation for storage using a ZigBee protocol wireless connection, in real-time.An important issue that arises is how to avoid missing values and outliers.A suitable pre-processing procedure, which has already been developed and described in detail in [39], is applied here.
The weather forecasts employed were delivered by a weather service each day at 11 a.m. of the day before the forecasted one, for the exact location of the PV plant.The historical hourly database of these parameters was used to train the network and includes the following parameters:

•
T amb ambient temperature ( In addition to these parameters, in order to train the PHANN method, the local time LT (hh:mm) of the day and the Clear Sky Radiation model CSRM (W/m 2 ) were also provided.These are the eleven inputs of the ANN.Regarding the specific settings of the ANN, exception made for the training database composition (as presented in Section 2), they were selected on the basis of a sensitivity analysis, as outlined in a previous study [23].The ANN settings adopted in this study were: The share of the data included in the training and in the validation steps have been adjusted by means of another sensitivity analysis.Independently of how many days were employed in the training, the database was divided into two groups containing different amounts of data.Thereafter, they were provided first to train the network and the remaining data for the validation.Finally, the ensemble forecast was performed.This procedure was followed several times, progressively increasing the number of days employed in the training-process.The above-mentioned performance indexes over the whole year were calculated, and according to the different shares adopted between training and validation, the results are plotted in Figures 11 and 12.The results depicted here refer to the training method C1, and the reason for this choice will be explained later in Section 5.As can be seen, the best results are always guaranteed by adopting 90% of data for the training and the remaining 10% for the validation (the blue rhomboidal curve).However, the zoom in the top-right corner of Figure 11 shows that, for the largest amount of data (210 days), also 80% of data for the training and 20% for the validation (the purple dotted curve) provided similar results to the previously described curve.The same N MAE % trends were obtained in Figure 12, where the trend of EMAE % is shown as a function of the data-set size and the shares of training and validation set.The same analysis is performed for the training Method A* by comparing the results of N MAE % in Figure 13 and the new error definition Equation (9) shown in Figure 14.

Results
The study carried on so far aimed to compare different methods in the data-set composition employed for the training of the ANN, highlighting the most effective ones.The obtained results of the day-ahead forecasts were analysed by the indexes shown in Section 3 and led to the following results.The graph in Figure 15 shows the trend of the N MAE % calculated for the methods in the training-set composition, according to increasing data-set sizes.The best training method, which globally performed better with all the data-sets considered, was undoubtedly C1.Instead, in the short-range training, with only 10 days available in the data-set, method C2 scored the worst result with N MAE % equal to 6.079.In accordance with the increasing data-sets method, C2 aligned with C1 above 90-130 days.The same trends of the other evaluation indexes are equally shown in Figures 16-18 and confirm the same results.From this perspective, method C2 scored the worst result, with EMAE % equal to 36.51.According to the N MAE % shown in Figure 15, methods B1 and B2 generally performed pretty much the same.As a general comment on the reported results, it can be stated that method A is best suited when the availability of historical data is limited (e.g., newly deployed PV plant), while method C1 appears to be most effective in the case of a greater availability of data (e.g., at least one year of power measurements from the considered PV facility).Generally speaking, ensembles composed of independent trials are most effective.The performance of methods B1 and B2 was halfway between A and C, and their effectiveness in the case of newly deployed PV plants became significant after a minimum period of measurement data accumulation (above 60 days).

Conclusions
This paper has presented a specific study aimed to analyze the effect of different approaches in the composition of a training data-set for the day-ahead forecasting of PV power production.In particular, the authors proposed different procedures to set-up the training and validation data-sets for the ANN used in physical hybrid method to perform the power forecast in view of the electricity market.The here-outlined approaches can be adopted to set-up data-sets based on either historical data retrieved from an existing PV plant or on incremental data measurements in a newly deployed PV facility.In particular, the influence of different data-set compositions on the forecast outcome has been inspected by increasing the training dataset size and by varying the training and validation shares, in order to assess the most effective training method of this machine learning approach, based on commonly used and newly-defined performance indexes for the prediction error.The reported results have been validated over a 1-year time range of experimentally measured data from a real PV power plant, considering a comparison of various error measures and showing the best approach for the different cases of either newly deployed or already existing PV facilities.

Figure 1 .
Figure 1.Main features of the ANN training data-sets.

•
Method A employs the same chronologically consecutive samples by grouping the 90% of the samples which are closest to the forecast day for the training set and the remaining 10% of the samples for the validation set.• Method A* employs the same chronologically consecutive samples by grouping the 90% of the samples for the training set and the 10% of the samples which are closest to the forecast day for the validation set.• Method B employs the samples by randomly grouping them separately, 90% for the training set and 10% for the validation set.

Figure 3 .
Figure 3. Hourly samples are progressively available in an incremental training database.PV: photovoltaic.

Figure 4 .
Figure 4. Training database composition for methods A, A*, and B.

Figure 5 .
Figure 5. Hourly samples belonging to an extended period of time are available in a complete training database.

Figure 6 .
Figure 6.Hourly samples belonging to an extended period of time in a complete training database are randomly mixed.

Figure 7 .
Figure 7. Example of the daily errors trend.N MAE: normalized mean absolute error; nRMSE: normalized root mean square error; W MAE: weighted mean absolute error.

Figure 8 .
Figure 8. Normalized daily errors correlated in a scatterplot.

Figure 9 .
Figure 9. Example of a sunny day forecast-1 April 2014-with the relevant evaluation indexes.EMAE%: envelope-weighted mean absolute error.

Figure 10 .
Figure 10.Example of a cloudy day forecast-4 November 2014-with the relevant evaluation indexes.

•
neurons in the input layer: 11, • neurons in the first hidden layer: 11, • neurons in the second hidden layer: 5, • neurons in the output layer: 1, • training algorithm: Levenberg-Marquardt, • activation function: sigmoid, • number of trials in the ensemble forecast: 40.

Figure 11 .
Figure 11.N MAE % as a function of the dataset size.

Figure 12 .
Figure 12.EMAE % as a function of the dataset size.

Figure 13 .
Figure 13.N MAE % as a function of the dataset size.

Figure 14 .
Figure 14.EMAE % as a function of the dataset size.

Figure 15 .
Figure 15.N MAE % as a function of the dataset size.

Figure 16 .
Figure 16.nRMSE % as a function of the dataset size.

Figure 17 .
Figure 17.W MAE % as a function of the dataset size.

Figure 18 .
Figure 18.EMAE % as a function of the dataset size.

Table 1 .
Different methods for the composition of the ANN training data-sets which have been analysed.† (90%ts 10%vs) ts = training set; vs = validation set.