Neural Network Ensemble Based Approach for 2d-interval Prediction of Solar Photovoltaic Power

Solar energy generated from PhotoVoltaic (PV) systems is one of the most promising types of renewable energy. However, it is highly variable as it depends on the solar irradiance and other meteorological factors. This variability creates difficulties for the large-scale integration of PV power in the electricity grid and requires accurate forecasting of the electricity generated by PV systems. In this paper we consider 2D-interval forecasts, where the goal is to predict summary statistics for the distribution of the PV power values in a future time interval. 2D-interval forecasts have been recently introduced, and they are more suitable than point forecasts for applications where the predicted variable has a high variability. We propose a method called NNE2D that combines variable selection based on mutual information and an ensemble of neural networks, to compute 2D-interval forecasts, where the two interval boundaries are expressed in terms of percentiles. NNE2D was evaluated for univariate prediction of Australian solar PV power data for two years. The results show that it is a promising method, outperforming persistence baselines and other methods used for comparison in terms of accuracy and coverage probability.


Introduction
Generating electricity from renewable sources is a key factor in the transition to a clean and sustainable energy future, to address environmental concerns and limit the global warming.Solar energy generated from PV systems is one of the most promising and fastest growing types of renewable energy [1].This growth is driven by government incentives that encourage the use of solar energy and by the decreasing cost of PV panels, making them more affordable.Since 2000, the installation of PV systems worldwide has increased 100 times, reaching 178 GW in 2014, and this capacity is expected to triple by 2019 [2].It is also expected that by 2050 PV systems will provide 29% of the electricity in Australia [3] and 25% of the electricity in Europe [4].
In comparison to the traditional fossil and nuclear energy sources, solar energy is freely available, can be easily harnessed and is environmentally free.However, unlike the traditional energy sources, it is highly variable as it depends on the solar irradiance, cloud cover and other meteorological factors.This creates challenges for its mass integration in the electricity grid.The European Photovoltaic Industry Association has identified forecasting as one of the key factors enabling large-scale integration of solar power in the electricity grid [4].Accurate forecasting ensures reliable supply, reduces the costs by allowing more efficient and secure management of electricity grids and also supports solar energy trading at electricity markets [4].

Data
We use solar power data from the largest flat-panel, grid-connected, PV system in Australia.It is located at the St Lucia campus of the University of Queensland in Brisbane and has about 5000 solar panels distributed at the roof-top of four buildings, generating up to 1.22 MW of electricity.
The data is measured at 1-min intervals for 24 h.We collected data for 10 h during the day, from 7:00 am to 5:00 pm, for two complete years-from 1 January 2013 to 31 December 2014.Outside these 10 h the solar power is either zero or very low due to the absence of solar irradiation.The data is publicly available at [21].
The original data set consists of 2 × 365 × 600 = 438,000 measurements in 1-min resolution.There were only 1518 missing values which is about 0.35%.Each missing value was replaced by the average of the values from the previous 5 min, i.e., with the 5 most recent observations of the PV power output.The 1-min data was aggregated into 5-min intervals by averaging every 5 consecutive measurements, resulting in 2 × 365 × 120 = 87,600 measurements in total.The data was normalized to the range (0,1).
The generated PV power depends on the weather conditions, especially the solar irradiance.Figure 1 plots the solar power profiles at half-hourly intervals, for three different types of days: sunny (13 April 2013), cloudy (15 April 2013) and rainy (20 April 2013).The three graphs differ considerably-for sunny days, the power output from a PV system is the highest and typically follows a bell shaped curve; for cloudy and rainy days, the generated PV power is lower and it varies during the day due to the changes in the weather conditions.
Figure 1 plots the solar power profiles at half-hourly intervals, for three different types of days: sunny (13 April 2013), cloudy (15 April 2013) and rainy (20 April 2013).The three graphs differ considerably -for sunny days, the power output from a PV system is the highest and typically follows a bell shaped curve; for cloudy and rainy days, the generated PV power is lower and it varies during the day due to the changes in the weather conditions.

Problem Definition
Let X = X 1 , X 2 , . . ., X t be a discrete time series of PV power outputs up to time t.Our goal is to compute the 2D-interval forecast for the next interval with length k, i.e., for the values X t+1 , X t+2 , . . ., X t+k .Specifically, at time t we predict the upper and lower bounds for the values of X in the interval [t + 1, t + k] (see Figure 2).
There are different ways to construct the upper and lower bounds of the k-length interval using descriptive statistics-e.g., using the 75th and 25th percentiles as in [18] or using the maximum and minimum values as in [20].In this study, we follow the first method as percentiles are more robust to noise and outliers than minimum and maximum values.Let P α k and P β k are the α and β percentiles (α > β) for the PV power time series X for the k-length future time interval [t + 1, t + k].The interval formed by them corresponds to the interval where |α − β| × 100% of the values of X are expected to lie [6].In this work, we construct two types of 2D-interval forecasts: by using the 90th and 10th percentiles, as well as 75th and 25th percentiles.However, our method is general and can be used for intervals with different bounds and lengths, depending on the specific application and scenario.

Problem Definition
Let  =  1 ,  2 , … ,   be a discrete time series of PV power outputs up to time .Our goal is to compute the 2D-interval forecast for the next interval with length k, i.e., for the values  +1 ,  +2 , … ,  + .Specifically, at time t we predict the upper and lower bounds for the values of  in the interval [ + 1,  + ] (see Figure 2).
There are different ways to construct the upper and lower bounds of the -length interval using descriptive statistics-e.g., using the 75th and 25th percentiles as in [18] or using the maximum and minimum values as in [20].In this study, we follow the first method as percentiles are more robust to noise and outliers than minimum and maximum values.The interval formed by them corresponds to the interval where | − | × 100% of the values of  are expected to lie [6].In this work, we construct two types of 2D-interval forecasts: by using the 90th and 10th percentiles, as well as 75th and 25th percentiles.However, our method is general and can be used for intervals with different bounds and lengths, depending on the specific application and scenario.

Variable Selection
Variable (or feature) selection is a key factor affecting the performance of prediction systems [22,23].The goal is to select a set of input variables that are relevant, important and sufficient for predicting the target variable.
To select lag variables, we used a method based on MI.MI is an information theoretic measure of the relationship between two variables X and Y-it measures the amount of information obtained from X in the presence of Y and vice-versa.If the two variables are independent, MI is zero; if they are dependent, MI has a positive value corresponding to the strength of the relationship.In contrast to traditional feature selection methods for time series such as autocorrelation which capture only linear dependencies, MI is able to detect both linear and non-linear dependencies.To compute MI we applied the method of Kraskov et al. [24] which is based on nearest neighbor distances.
We extract all lag variables from a time window with length one day (120 lag variables in total) and calculate the MI score between each of them and the targets P α k and P β k , for intervals with different lengths: k = 12, 24 and 36.One day of previous data is sufficient as the PV power is affected by changes in the solar irradiance and they are best captured in the most recent data.Figure 3 shows the MI score of the candidate lag variables for k = 12 and 24; the graph for k = 36 is similar and not shown.As we can see from Figure 4, the MI score drops sharply and then flattens.To select a set of predictive variables, we chose a cut-off threshold of MI = 0.6, which was satisfied by 5-6 variables for all three interval lengths.The cut-off threshold was chosen empirically; our goal was to select a small set of highly predictive variables.Further analysis revealed that the selected variables were variables extracted from the previous 30 min, which shows the importance of the most recent data.We chose the previous 6 PV lag variables for inclusion in the final feature set and validated this selection by increasing the number of variables up to 12, one at a time, but found that the inclusion of variables beyond lag 6 does not improve the prediction accuracy.
Energies 2016, 9, 829 6 of 17 all three interval lengths.The cut-off threshold was chosen empirically; our goal was to select a small set of highly predictive variables.Further analysis revealed that the selected variables were variables extracted from the previous 30 min, which shows the importance of the most recent data.We chose the previous 6 PV lag variables for inclusion in the final feature set and validated this selection by increasing the number of variables up to 12, one at a time, but found that the inclusion of variables beyond lag 6 does not improve the prediction accuracy.In addition to the 6 selected lag variables, following [20] we also selected the upper and lower bounds,  −1  and  −1  , of the previous -length interval  −+1 , … ,  −1 ,   .These features provide useful information about the recent dynamics of the two percentiles that we are trying to predict.Thus, we use 8 features in total as listed in Table 1.In addition to the 6 selected lag variables, following [20] we also selected the upper and lower bounds, P α k−1 and P β k−1 , of the previous k-length interval X t−k+1 , . . ., X t−1 , X t .These features provide useful information about the recent dynamics of the two percentiles that we are trying to predict.Thus, we use 8 features in total as listed in Table 1.

Variable Types Selected Variables Number of Variables
Lag variables 2 Total number of variables -8

Prediction Model-NN Ensemble
As a prediction model for P α k and P β k , we use an ensemble of NNs.NNs are one of the most popular prediction algorithms for solar power forecasting [11,14,25], and also for other energy time series such as electricity load forecasting [26][27][28].However, their performance is very sensitive to the NN architecture and the random initialization of weights.By combining the predictions of several NNs in an ensemble, this sensitivity can be reduced.Ensembles of prediction algorithms are also typically more accurate than a single ensemble member [29].
We adapt our NN ensemble method from [30] which was previously applied for point forecasting; we call the adapted prediction model NNE2D.
Single NN.A single NN, part of NN2D, is a multilayer perceptron NN as shown in Figure 5.It has p input neurons corresponding to the selected features, two output neurons for P α k and P β k and one hidden layer where the number of hidden neurons is determined experimentally as described below.The NN weights are initialised to random values using the Nguen-Widrow method [31].
Each single NN is trained separately on the training data using the Levenberg Marquardt (LM) algorithm [32] to minimize the mean squared error on the training data.LM has been chosen over the standard steepest gradient descent backpropagation algorithm because of its faster convergence.
The training is terminated when one of the following conditions is met: (1) the maximum number of training epochs (1000) is reached; (2) no improvement in the performance on the validation set is observed for a certain number of epochs (10); or (3) the performance gradient becomes very small (less than 1 × 10 −7 ).
Ensemble of NNs.We build V ensembles of NNs and then select the best one, as shown in Figure 6.Each ensemble combines the predictions of m NNs with the same number of hidden neurons but different initialization of the weights.For example, the ensemble E V combines the predictions of In our experiments we used m = 10 and V = 30.
After each NN is trained separately on the training data, the performance of the V ensembles is evaluated on the validation set and the best performing ensemble E best , the one with the lowest prediction error (MRE) on the validation set, is selected and used to predict the testing data.
set is observed for a certain number of epochs (10), or (3) the performance gradient becomes very small (less than 1 × 10 −7 ).
Ensemble of NNs.We build V ensembles of NNs and then select the best one, as shown in Figure 6.Each ensemble combines the predictions of m NNs with the same number of hidden neurons but different initialization of the weights.For example, the ensemble   combines the predictions of m NNs with V hidden neurons.The combination is done by taking the median of all m predictions.In our experiments we used m = 10 and V = 30.
After each NN is trained separately on the training data, the performance of the V ensembles is evaluated on the validation set and the best performing ensemble   , the one with the lowest prediction error (MRE) on the validation set, is selected and used to predict the testing data.

Predicting New Data
To predict    and

Predicting New Data
To predict P α k and P β k for a new example i, the ensemble E best combines the predictions of its ensemble members (i.e., single NNs) by taking the median of their predicted lower and upper bounds as shown in Figure 6.This means that: Pα k = median Pα are the predictions for the α and β percentiles generated by an NN j of ensemble E best , j = 1, . . ., m.

Methods Used for Comparison
To provide a comprehensive evaluation of NNE2D, we compare its performance with two persistence models and a method similar to NNE2D but using SVR instead of an NN ensemble.

Persistence Models
We use two persistence models as baselines for comparison.Persistence models consider the recently observed values of time series data as predicted values and are used as baselines.We implemented two persistence models: B1 and B2.
The first baseline (B1) uses the previous k-length interval.The 2D-interval forecast at time t for the interval [t + 1, t + k] is given by the percentiles of the previous interval [t − k − 1, t], i.e.,: P α k−1 = α percentile of (P t−k+1 , . . . ,P t−1 ) and P β k−1 = β percentile of (P t−k+1 , . . . ,P t−1 ).The second baseline (B2) uses the k-interval from the previous day, at the same time.The 2D-interval forecast at time t for the interval [t + 1, t + k] is given by using the percentiles of the time series for the interval [t where d is the total number of observations in a day, i.e., P α k−d = α percentile of (P t−d−k+1 , . . . ,P t−d ) and P β k−d = β percentile of (P t−d−k+1 , . . . ,P t−d ).Zhang et al. [33] showed that for point forecasting a persistence model similar to B2 was more accurate compared to ARIMA, NNs and SVR when the consecutives days have similar PV power characteristics.

SVR Based Method
We also evaluate the performance of SVR as a prediction algorithm instead of an NN ensemble, under the same experimental conditions.We call this method SVR2D.
The key idea of SVR is to map the input data into a higher dimensional feature space using a non-linear transformation and then apply linear regression in the new space.The linear regression in the new space corresponds to nonlinear regression in the original space.The task is formulated as an optimisation problem.The main goal is to minimize the error on the training data, but the flatness of the line and the trade-off between training error and model complexity are also considered to prevent overfitting.The solution is defined by a small subset of training examples, called support vectors.
Solving the optimization problem requires computing dot products of input vector in the new space which is computationally expensive in high dimensional spaces.To help with this, kernel functions satisfying the Mercer's theorem are used-they allow the dot products to be computed in the original lower dimensional space and then mapped to the new space.
Since SVR can have only one output, SVR2D divides the 2D-interval prediction task into two subtasks: predicting the upper bound P α k and predicting the lower bound P β k , and builds a separate SVR prediction model for each of them: We used Radial Basis Function (RBF) kernel, which was selected after empirical evaluation and comparison of different kernel functions.

Training, Validation and Testing Sets
We divided the available solar power data for the two years (2013 and 2014) into three non-overlapping subsets: training (50%) validation (25%), and testing (25%).The data for 2013 was assigned to the training set, 50% of the data for 2014 was assigned to the validation set and the remaining 50% for 2014 was assigned to the testing set.Table 2 shows the number of samples in each subset.The training set was used for feature selection and training of the prediction models, the validation set was used to tune the parameters of the prediction models, e.g., for selecting the best NN ensemble and kernel function for SVR2D.The testing set was used to evaluate the performance of the prediction models for 2D-interval forecasting.

Interval Length
We consider intervals with three different lengths: 1 h, 2 h and 3 h, and two different bounds: 90%-10% and 75%-25% percentiles.This allows us to better evaluate our approach under different conditions and also shows that it can be used in different scenarios by the power system operators depending on the task, e.g., real-time dispatching, storage management, etc. Table 3 summarizes the interval lengths in terms of steps k and time l, and also the bounds we used.

Evaluation Mesusres
We use three different evaluation measures: Mean Absolute Interval Deviation (MAID), Mean Relative Error (MRE) and Interval Coverage Probability (ICP): where: P α k,i and P β k,i are the actual values of the upper and lower bounds of the k-length interval for i-th example in the dataset and Pα k,i and Pβ k,i are the respective predicted values; k is the length of interval, N is the total number of examples in the dataset, R is the range of the target values of the PV power output data; and MAID is standard measure for interval forecasts [18,20].It measures the mean absolute deviation of the upper and lower bounds of the predicted intervals from the upper and lower bounds of the actual intervals.MRE measures the percentage deviation of the predicted intervals from the actual intervals normalized by the range of the target values.It is an extension of MAID and facilitates comparison of prediction errors for data sets with different range of values.Low values for both MAID and MRE indicate high prediction accuracy.
ICP measures the likelihood of the k values of the time series for the next k-length interval to be included in the predicted interval [ Pα k , Pβ k ], averaged over all points in the data set.High coverage probability indicates high accuracy.For 2D-interval forecasts with upper and lower bounds constructed using α and β percentiles (α > β), the expected coverage probability is |α − β| × 100%.

Accuracy
Table 4 presents the accuracy results for NNE2D and Table 5 for the methods used for comparison.Figures 7 and 8 show a visual comparison of the MRE and ICP results for all methods.
The results show that NNE2D outperformed the other methods for all interval lengths and interval boundaries, for all three evaluation measures.On average NNE2D achieved the following percentage improvements in terms of MAID and MRE: compared to SVR2D: 21.50%-24.40%,compared to B1: 44.65%-58.45% and compared to B2: 58.63%-61.66%.These improvements in terms of ICP are: 6.48%-21.04%,83.7%-50.42%and 91.68%-75.58%respectively.SVR2D is the second best method and it also considerably outperforms the two baselines.From the two baselines, B1 is more accurate than B2, which shows that the most recent solar data is more important than the data from the previous day for 2D-interval prediction.This is also supported by the performance of both NNE2D and SVR2D which utilise a small set of input variables from the most recent solar power data.Figure 9 shows the regression results of the predicted and actual values for the upper and lower bounds of the 2D-intervals with lengths 1 h, α = 90th and β = 10th percentile.The regression plots for the remaining cases are very similar and not shown.We can see that most of the data falls along the diagonal line, with high values for the coefficient of determination R (0.88-0.90), indicating that the predicted and actual bounds of the 2D-intervals are very close to each other.

Performance for Different Interval Lenghts
From Table 4 we can see that the accuracy of NNE2D varies depending on the lengths of the intervals.The best accuracy results in terms of MAID and MRE are achieved for the intervals with the smallest length (l = 1 h, k = 12 steps).The accuracy decreases as the length of the predicted interval increases.These findings are as expected-it is generally easier to predict values that are closer in time to the current value.
In terms of coverage probability ICP, the results for the 10-90 percentile interval are similar for the different interval lengths while for the 25th-75th percentile interval ICP slightly increases with the increase of the interval length.We can also see that ICP is higher for the 10th-90th percentile interval than for the 25th-75th percentile interval.This is anticipated since the expected coverage probability drops from 80% (=90%-10%) for the first case to 50% (=75%-25%) for the second case.

Performance for Different Interval Lenghts
From Table 4 we can see that the accuracy of NNE2D varies depending on the lengths of the intervals.The best accuracy results in terms of MAID and MRE are achieved for the intervals with the smallest length (l = 1 h, k = 12 steps).The accuracy decreases as the length of the predicted interval increases.These findings are as expected-it is generally easier to predict values that are closer in time to the current value.
In terms of coverage probability ICP, the results for the 10-90 percentile interval are similar for the different interval lengths while for the 25th-75th percentile interval ICP slightly increases with the increase of the interval length.We can also see that ICP is higher for the 10th-90th percentile interval than for the 25th-75th percentile interval.This is anticipated since the expected coverage probability drops from 80% (=90%-10%) for the first case to 50% (=75%-25%) for the second case.

Performance for Different Interval Lenghts
From Table we can see that the accuracy of NNE2D varies depending on the lengths of the intervals.The best accuracy results in terms of MAID and MRE are achieved for the intervals with the smallest length (l = 1 h, k = 12 steps).The accuracy decreases as the length of the predicted interval increases.These findings are as expected-it is generally easier to predict values that are closer in time to the current value.
In terms of coverage probability ICP, the results for the 10-90 percentile interval are similar for the different interval lengths while for the 25th-75th percentile interval ICP slightly increases with the increase of the interval length.We can also see that ICP is higher for the 10th-90th percentile interval than for the 25th-75th percentile interval.This is anticipated since the expected coverage probability drops from 80% (=90%-10%) for the first case to 50% (=75%-25%) for the second case.
To provide additional insights, Table 6 shows the Mean Interval Width (MIW) for the constructed intervals.Similarly to MAID and MRE, we can see that MIW increases as the interval length increases, and this holds for both predicted and actual intervals.MIW also depends on the upper and lower evaluated for predicting the solar PV power output using Australian PV power data for two years, sampled at 5-min intervals, for future intervals with length from 1 to 3 h.The prediction was done using only previous PV power data, without any weather data.
NNE2D was compared with two persistence models used as baselines and a method based on SVR.It achieved MRE of 6.47%-8.16%and coverage probability ICP of 67.04-67.66 for the 10-90th percentiles and MRE of 6.50-8.83and ICP of 42.96%-45% for the 25-75th percentiles, considerably outperforming the methods used for comparison.NNE2D also showed superior performance when compared to similar but multivariate methods that use both PV power and weather data.This shows that for very-short term forecasting up to 3 h, there is no need to use weather data.In addition, NNE2D was fast to train which makes it suitable for both online and offline training.
Considering both prediction accuracy and the computational requirements, we conclude that NNE2D is a promising approach for 2D-interval forecasts and is viable for practical applications.It can be used to predict other summary statistics for future intervals, not only percentiles, depending on the specific task and forecasting scenarios.
There are several avenues for future work that we will explore.Firstly, we plan to conduct a detailed sensitivity analysis investigating the impact of the relative size of the training, validation and testing sets and the effect of resampling of these sets when evaluating the performance.Secondly, we will examine if the daily periodical component of the PV power (more pronounced for some types of days, e.g., sunny) can be used to improve the results.Thirdly, we will explore alternative data normalization methods, e.g., based on the daily cycle, that consider the sunrise and sunset times and the solar height.Furthermore, we plan to apply our method to other energy time series, in particular wind energy, electricity demand and electricity price.

Figure 1 .
Figure 1.Typical patterns of solar power output under different weather conditions: for a sunny day (13 April 2013), cloudy day (15 April 2013) and rainy day (20 April 2013).

Figure 1 .
Figure 1.Typical patterns of solar power output under different weather conditions: for a sunny day (13 April 2013), cloudy day (15 April 2013) and rainy day (20 April 2013).
Let    and    are the  and  percentiles ( > ) for the PV power time series  for the -length future time interval [ + 1,  + ].

Figure 3
Figure 3 presents a block diagram of NNE2D.It uses an ensemble of NNs for computing the 2D-interval forecasts and includes three main steps: (1) variable selection, (2) training and selection of the NN ensemble, and (3) forecasting of new data, as described below.

Figure 3
Figure 3 presents a block diagram of NNE2D.It uses an ensemble of NNs for computing the 2D-interval forecasts and includes three main steps: (1) variable selection; (2) training and selection of the NN ensemble; and (3) forecasting of new data, as described below.

Figure 3
Figure 3 presents a block diagram of NNE2D.It uses an ensemble of NNs for computing the 2D-interval forecasts and includes three main steps: (1) variable selection, (2) training and selection of the NN ensemble, and (3) forecasting of new data, as described below.

Figure 3 .
Figure 3. Block diagram of the proposed NNE2D approach.

Figure 3 .
Figure 3. Block diagram of the proposed NNE2D approach.

Figure 4 .
Figure 4. MI score of lag variables for intervals with different length k: (a) k = 12 and (b) k = 24.

Figure 4 .
Figure 4. MI score of lag variables for intervals with different length k: (a) k = 12 and (b) k = 24.

m
NNs with V hidden neurons.The combination is done by taking the median of all m predictions.

Figure 5 .
Figure 5. Architecture of a single NN in NNE2D.

Figure 6 .
Figure 6.Ensemble construction and selection for NNE2D.

Figure 6 .
Figure 6.Ensemble construction and selection for NNE2D.

Figure 9 .
Figure 9. Regression plots of actual and predicted values for the intervals with length 30 min, and α = 90th and β = 10th percentiles.

Figure 9 .
Figure 9. Regression plots of actual and predicted values for the intervals with length 30 min, and α = 90th and β = 10th percentiles.

Figure 9 .
Figure 9. Regression plots of actual and predicted values for the intervals with length 30 min, and α = 90th and β = 10th percentiles.

Table 1 .
Selected variables to predict P α k and P β k of the k-length interval [t + 1, t + k].

Table 2 .
Summary of training, validation and testing data sets.

Table 3 .
Interval lengths and bounds for 2D-interval forecasts.

Table 5 .
Accuracy results for the methods used for comparison: SVR2D, B1 and B2.

Table 5 .
Accuracy results for the methods used for comparison: SVR2D, B1 and B2.