Developing and Comparing Different Strategies for Combining Probabilistic Photovoltaic Power Forecasts in an Ensemble Method

Accurate probabilistic forecasts of renewable generation are drivers for operational and management excellence in modern power systems and for the sustainable integration of green energy. The combination of forecasts provided by different individual models may allow increasing the accuracy of predictions; however, in contrast to point forecast combination, for which the simple weighted averaging is often a plausible solution, combining probabilistic forecasts is a much more challenging task. This paper aims at developing a new ensemble method for photovoltaic (PV) power forecasting, which combines the outcomes of three underlying probabilistic models (quantile k-nearest neighbors, quantile regression forests, and quantile regression) through a weighted quantile combination. Due to the challenges in combining probabilistic forecasts, the paper presents different combination strategies; the competing strategies are based on unconstrained, constrained, and regularized optimization problems for estimating the weights. The competing strategies are compared to individual forecasts and to several benchmarks, using the data published during the Global Energy Forecasting Competition 2014. Numerical experiments were run in MATLAB and R environments; the results suggest that unconstrained and Least Absolute Shrinkage and Selection Operator (LASSO)-regularized strategies exhibit the best performances for the datasets under study, outperforming the best competitors by 2.5 to 9% of the Pinball Score.


Introduction
Load demand and non-controllable generation powers are the main sources of uncertainty in modern electrical grids and forecasting of these issues is of the greatest interest during planning and operation stages.In particular, disposing of accurate load and generation predictions is mandatory in order to tackle and solve a large variety of power system tasks, such as market bidding, energy dispatch in smart grids and microgrids, replacement reserve scheduling, virtual power plant aggregation, and sizing of battery energy storage systems [1][2][3][4][5][6][7].
Relevant literature on load and generation forecasting is quite heterogeneous; this is highlighted by the comparative dissertations in reviews and surveys [8,9], clearly showing that no method outperforms the others in every aspect.Major efforts have been devoted to point prediction, for which researchers and practitioners often individuate Artificial Neural Networks (ANN) [10,11], K-Nearest Neighbors (KNN) [12], support vector regression [13], Random Forests (RF) [14], and multiple linear regression models [15] as the best solutions.Historically, research efforts have often been dedicated to load forecasting, since the generation has been mainly constituted by dispatchable fossil-fueled and hydroelectric plants.Today, instead, the widespread penetration of renewable generation, and in particular of photovoltaic (PV) and wind systems, makes forecasting renewable generation essential to cope with the new power system tasks, addressing the uncertainty of the energy source.Moreover, due to the intrinsic randomness of the physical phenomenon, probabilistic PV power forecasting is more adequate to deal with the management and operation of electrical networks under uncertainties [16,17]; however, only a minor part of the existing literature has dealt with PV power forecasting under a probabilistic framework.Relevant existing approaches are based on Quantile K-Nearest Neighbors (QKNN) [18], Quantile Regression Forests (QRFs) [19], Quantile Regression (QR) [20,21], and Gradient Boosting Regression Trees [18]; these models proved their effectiveness also in recent energy forecasting competitions [16,22], since the forecasting systems developed by the highest-ranking teams were based on these nonparametric probabilistic models.
New trends in probabilistic PV power forecasting individuate the probabilistic combination of individual forecasts as a suitable solution, in order to improve the accuracy of the results [9,20,23].Probabilistic forecast combination is not as straightforward as it seems to be at first inspection.In contrast to combining point forecasts, for which the simple weighted averaging is often a plausible solution, combining probabilistic forecasts is a much more challenging task; the combined probabilistic forecasts indeed must retain adequate properties in terms of reliability and sharpness [20,24], and the main features of a probabilistic forecast (e.g., the ascending order of predictive quantiles) must be retained also by the combined forecasts [23].
Relevant literature has addressed these aspects under different points of view [9].Individual probabilistic forecasts can be indeed merged: (i) by a combination of the predictive cumulative distribution functions [20]; or (ii) by a combination of the predictive quantiles [23].The first combination type has already been applied to PV power forecasting, whereas the second combination type has yet to be applied to PV power forecasting (it has been presented in [23] only for load forecasting).Nevertheless, within these two types of approach, several strategies and architectures can be developed to combine forecasts, so there is room for further investigation and improvement.
In this context, this paper aims to provide a further contribution on the probabilistic combination of PV power forecasts.The paper develops and compares different forecast combination strategies applied to three individual probabilistic models (QKNN [18], QRF [19], and QR [20,21]), which are selected among the state-of-the-art nonparametric solutions for probabilistic forecasting.The outcomes of these models are properly combined under a competitive ensemble framework, based on a weighted combination of predictive quantiles.
Estimating the combination weights is a challenging task; several estimation strategies and architectures are therefore analyzed in this paper, in order to check the effectiveness of the model combination from different perspectives, and to allow picking the best solution for combining forecasts.In particular, the competing strategies are based on unconstrained, constrained, and regularized optimization problems for estimating the weights used to combine the predictive quantiles.
To guarantee the reproducibility of the experiments, the data published in the framework of the Global Energy Forecasting Competition 2014 (GEFCOM2014) [16] are used in this paper.In order to validate the proposal, the results are compared to relevant probabilistic benchmarks in actual scenarios.
Eventually, the main contributions of this paper are: • the development of a new competitive ensemble method that combines the outcomes of three probabilistic models, selected among the ones which have proved consistency in probabilistic PV power forecasting; • a critical analysis of different strategies and architectures to estimate the weights of the predictive quantile combination; • a comparison of the results of the numerical experiments, based on the data published in the framework of the GEFCOM2014, with state-of-the-art probabilistic benchmarks.
The paper is organized as it follows.Section 2 provides an overview of the competitive ensemble method; Section 3 briefly describes the underlying probabilistic models; Section 4 presents the combination architectures and strategies analyzed in this paper; Section 5 shows benchmarks used for the comparison; Section 6 shows the results of the numerical experiments; and our conclusions are in Section 7.

Overview of the Proposed Competitive Ensemble Method for Forecasting PV Power
The forecasts of the underlying probabilistic models are combined in this paper in a competitive ensemble method, illustrated in Figure 1.Historical PV power and weather data, together with calendar qualitative variables, are the inputs of the procedure.These inputs are used by the underlying probabilistic models in order to build individual probabilistic forecasts of PV power, provided in terms of predictive quantiles.Eventually, the predictive quantiles returned from the underlying models are fed as inputs of the ensemble model, in order to be properly combined.In the forecast combination step, calendar variables may or may not be used; this differentiates the parameter estimation, as will be discussed in Section 4. The outputs of the procedure are probabilistic PV power forecasts, given in terms of predictive quantiles.The paper is organized as it follows.Section 2 provides an overview of the competitive ensemble method; Section 3 briefly describes the underlying probabilistic models; Section 4 presents the combination architectures and strategies analyzed in this paper; Section 5 shows benchmarks used for the comparison; Section 6 shows the results of the numerical experiments; and our conclusions are in Section 7.

Overview of the Proposed Competitive Ensemble Method for Forecasting PV Power
The forecasts of the underlying probabilistic models are combined in this paper in a competitive ensemble method, illustrated in Figure 1.Historical PV power and weather data, together with calendar qualitative variables, are the inputs of the procedure.These inputs are used by the underlying probabilistic models in order to build individual probabilistic forecasts of PV power, provided in terms of predictive quantiles.Eventually, the predictive quantiles returned from the underlying models are fed as inputs of the ensemble model, in order to be properly combined.In the forecast combination step, calendar variables may or may not be used; this differentiates the parameter estimation, as will be discussed in Section 4. The outputs of the procedure are probabilistic PV power forecasts, given in terms of predictive quantiles.

Probabilistic Underlying Models
Three probabilistic underlying models are selected and used to build individual forecasts.They are based on the QKNN, QRF, and QR techniques; all of these underlying models provide probabilistic forecasts in terms of predictive quantiles.

Quantile K-Nearest Neighbors
K-Nearest Neighbors (KNN) models are widely used in regression problems, due to their versatility and ease of use.Extending KNN to the probabilistic framework, thus formulating the QKNN model, is quite straightforward.
QKNN models are based on similarity effects.Given the predictors  + (related to the forecast time horizon  + , but known at the forecast origin ), the KNN model individuates, among the  predictor vectors, a subset  + = { 1 * , … ,   * } made of the  predictor vectors that are closest to the predictors  + .In this paper, the proximity relationship is mathematically expressed using the Euclidean metric, defined as: . (1)

Probabilistic Underlying Models
Three probabilistic underlying models are selected and used to build individual forecasts.They are based on the QKNN, QRF, and QR techniques; all of these underlying models provide probabilistic forecasts in terms of predictive quantiles.
A brief description of these models is provided hereinafter.For each model, we assume that the same, following training data are available at the forecast origin t: (i) N historical values P t−(N−1) , P t−(N−2) , . . ., P t of PV power; (ii) N vectors z t−(N−1) , z t−(N−2) , . . ., z t of M predictors, corresponding to each of the N historical values of PV power.In particular, the generic jth vector of predictors is z j = z 1 j , . . ., z M j , for j = t − (N − 1), t − (N − 2), . . ., t.

Quantile K-Nearest Neighbors
K-Nearest Neighbors (KNN) models are widely used in regression problems, due to their versatility and ease of use.Extending KNN to the probabilistic framework, thus formulating the QKNN model, is quite straightforward.
QKNN models are based on similarity effects.Given the predictors z t+k (related to the forecast time horizon t + k, but known at the forecast origin t), the KNN model individuates, among the N predictor vectors, a subset Z t+k = z * 1 , . . ., z * K made of the K predictor vectors that are closest to the predictors z t+k .In this paper, the proximity relationship is mathematically expressed using the Euclidean metric, defined as: Energies 2019, 12, 1011 4 of 16 The subset P t+k = p * 1 , . . ., p * K of measured PV powers, corresponding to the subset Z t+k , is straightforwardly individuated.The QKNN forecast P (QKNN) t+k (q i ) at level q i is then obtained as the sample q i -quantile of the subset P t+k .
The hyper-parameter K (i.e., the number of neighbors) is selected in this paper in a cross-validation procedure.

Quantile Regression Forests
QRFs are groups of D decision trees, where individual trees are built by randomly selecting bagged subsets from the available pool of predictor variables.
Given the predictors z t+k (related to the forecast time horizon t + k, but known at the forecast origin t), one leaf of each tree is univocally individuated.In particular, for the generic dth tree, it is denoted as L d (z t+k ).In QRF, all of the outcomes contained in the D leaves, which have been individuated, concur to form the probabilistic forecast for the time horizon t + k.
The QRF predictive distribution is estimated as: where 1{•} = 1 if the condition in the brackets is true, 1{•} = 0 if the condition is false, and a weight coefficient w n (z t+k ) is estimated for each of the N historical vectors of predictors, as: and R L d (z t+k ) is the rectangular subspace of R M in which the leaf L d (z t+k ) finds its values.
Obtaining the QRF forecast P (QRF) t+k (q i ) at level q i is straightforward from (2); it is: The hyper-parameter D (i.e., the number of trees in the forest) is selected in this paper in a cross-validation procedure.

Quantile Regression
QR is a multiple linear regression model, the parameters of which are not estimated in a traditional ordinary least square approach, but instead they are estimated by minimizing the Pinball Score (PS) [16,25] in the training period.The PS is a proper score [25], which simultaneously accounts for reliability and sharpness of the probabilistic forecasts; it is the most common index in evaluating probabilistic forecasts, and is therefore used in all of the comparative analyses in this paper.
The analytic formulation of the QR model is: where β (q i ) is the vector of parameters to be estimated, and ε t+k is the residual.Parameters are estimated from: β(q i ) = argmin The unconstrained nonlinear programming problem in ( 6) can be put in a constrained linear programming problem [26]; this allows to increase the computational efficiency.It is represented as: where 1 [Nx1] is a [Nx1] vector of ones, and: ε The QR forecast P(QR) t+k (q i ) at level q i is then obtained as:

The Competitive Ensemble Model for Forecast Combination
The competitive ensemble model is based on the Quantile Weighted Sum (QWS), which has recently been applied to probabilistic load forecasting with interesting results [23].Eight different strategies are proposed and compared in this paper:

•
the Pure Quantile Weighted Sum (PQWS); In the "pure" approaches, weights are estimated without any differentiation in terms of daily periodicity, whereas in the "hourly" approaches weights are estimated using only same-hour observations; thus the weights are differentiated by the hour of the day to account for the daily periodicity of the PV power pattern.The last four approaches are extended in this paper starting from the Least Absolute Shrinkage and Selection Operator (LASSO) quantile regression [27] and from the Ridge quantile regression [28], respectively, which allow regularizing the weights by assigning a penalty linked to the magnitude of the weights.
In the PQWS and HQWS strategies, the weights are estimated without any constraint or regularization loss.Compared to the constrained or regularized strategies, the PQWS and HQWS strategies return the smallest in-sample PS, since the minimization problem is unconstrained.However, there is no assurance that these weights are the best picks for out-of-sample forecasts.This is a common issue for regression applied to forecasting, in which overfitting the training data has negative consequences when the model is used to forecast unknown data.
Therefore, in this paper, we compare the results of the unconstrained, non-regularized strategies to constrained and regularized strategies, in order to check their performances and to pick the best strategy to be used in practical applications.
The PCQWS and the HCQWS are the constrained strategies, in which weights are forced to sum for the unity.This ensures that the predictive quantiles of the combined forecasts do not oddly deviate from the average value of the three predictive quantiles of the individual predictors.The in-sample PS of the PCQWS (HCQWS) strategy is obviously greater than the corresponding in-sample PS of the PQWS (HQWS) strategy, but the out-of-sample performances may be very different.
The PQWSLR, the HQWSLR, the PQWSRR, and the HQWSRR strategies were instead developed in order to estimate weights having a regularized magnitude (in absolute value).Indeed, regularization of the parameters is a well-known strategy in order to avoid overfitting by penalizing the returned objective function (in this case, the PS), adding a loss term which directly depends on the magnitude of the parameters.In this paper, both the LASSO and the Ridge regularization are tested, in order to provide a comprehensive analysis.Note that, in these cases, the in-sample PSs of the PQWSLR/PQWSRR (HQWSLR/HQWSRR) strategies are obviously greater than the corresponding in-sample PS of the PQWS (HQWS) strategy, but the out-of-sample performances may be very different.
All of the strategies developed in this paper are presented in the following subsections.

Pure Quantile Weighted Sum
PQWS combination returns predictive quantiles at a given level q i , by summing the predictive quantiles (at the same level q i ) of the underlying models, multiplied for coefficients ω ( that are estimated in the training step.Starting from the PQWS approach, two strategies are separately analyzed in this paper: a combination of all of the three individual forecasts (PQWS3) and a combination of the two best individual forecasts (PQWS2).This differentiated analysis is run in order to check whether the addition of a third individual forecast, which is clearly worse than the other two, may add useful information when building the ensemble.In the following formulation, we will refer to the PQWS3 strategy, since its extension to the PQWS2 is trivial.The model is: The weights are estimated from: that is, by minimizing the PS in the training interval, which is made of L observed points.The hyper-parameter L is the length of the dataset used to train the weights of the combination models; it could be optimized by means of a model selection procedure (e.g., cross-validation).However, no selection procedure was run in this paper to pick the optimal hyper-parameter L; our purpose was instead to provide an exhaustive comparative analysis on the variation of the forecast errors with respect to this hyper-parameter.Nevertheless, the results of the comparative analysis can be used by the forecaster to build subsequent out-of-sample forecasts, picking the optimal result.

Hourly Quantile Weighted Sum
The daily seasonality of the PV power time series is taken into account in the HQWS approach.For the same purposes enunciated beforehand, two strategies were developed from the HQWS approach and separately analyzed: a combination of all of the three individual forecasts (HQWS3) and a combination of the two best individual forecasts (HQWS2).We present the HQWS3 strategy, since the extension to the HQWS2 case is trivial.The model is: where hod , ω(q i ) 24 2 , ω(q i ) The daily periodicity of the PV power pattern is therefore also accounted for in the forecast combination; Equation ( 14) is a new formulation proposed in this paper to account for it in PV power forecast combination.This new HQWS approach is indeed expected to improve the forecast combination, by differentiating the weights not only for different quantiles but also for different hours of the day.

Pure and Hourly Constrained Quantile Weighted Sum
The PCQWS and the HCQWS approaches are based on a constrained optimization formulation, in which the sum of the weights is constrained to the unity.Also, for these approaches we differentiate between a combination of all of the three individual forecasts (PCQWS3 and HCQWS3 strategies) and a combination of the two best individual forecasts (PCQWS2 and HCQWS2 strategies).
For the PCQWS3 strategy, the model is analogous to Equation ( 11), but the weights are estimated from: For the HCQWS3 strategy, the model is analogous to Equation ( 13), but the weights are estimated from: , ω(q i ) 24 2 , ω(q i )

Quantile Weighted Sum with LASSO Regularization
The PQWSLR and the HQWSLR approaches are based on the regularization of the weights through the LASSO [27].In contrast to the constrained approaches, in which the sum of the weights is assigned, the regularization of parameters in the PQWSLR and HQWSLR approaches is an output of the model itself (which indeed requires no pre-assignment from the forecaster).Due to the intrinsic capability of the LASSO in reducing the impact of uninformative predictors by assigning smaller (or even zero) weights to them [16], these two approaches were developed and tested only for the combination of all three of the individual forecasts, thus each of them straightforwardly identifies one combination strategy.
For the PQWSLR strategy, the model is analogous to Equation ( 11), but the weights are estimated from: For the HQWSLR strategy, the model is analogous to Equation ( 13), but the weights are estimated from: , ω(q i ) 24 2 , ω(q i ) The selection of the penalty coefficient λ L (which is an important hyper-parameter in LASSO regression) is performed in this paper in 5-fold cross-validation.

Quantile Weighted Sum with RIDGE Regularization
The PQWSRR and HQWSRR approaches are based on the Ridge regularization [28].The models are very similar to the LASSO-based ones, and also, in this case, the intrinsic capability in reducing the impact of uninformative predictors by assigning smaller weights to them [28] lead us to develop and test the PQWSRR and HQWSRR approaches only for the combination of all of the three individual forecasts, developing one strategy for each approach.
For the PQWSRR strategy, the model is analogous to Equation ( 11), but the weights are estimated from: For the HQWSRR strategy, the model is analogous to Equation ( 13), but the weights are estimated from: , ω(q i ) 24 2 , ω(q i ) The selection of the penalty coefficient λ R (which is an important hyper-parameter in Ridge regression) is performed in this paper in 5-fold cross-validation.

Benchmarks
The ensemble combination of probabilistic individual forecasts was mainly assessed in terms of relative improvement with respect to individual predictions.However, three relevant benchmarks were also added for comparison.They are briefly recalled in the following subsections.

Naïve Benchmark
A Naïve Benchmark (NB), was provided by the GEFCOM2014 organizers [16].It consists of point forecasts which are repeated for the 99 predictive quantiles.This benchmark was added in this paper in order to provide a direct comparison with outcomes of the GEFCOM2014.

Quantile Artificial Neural Network
An ANN-based probabilistic benchmark (QANN) is the second benchmark, which was added to provide a comparison with an artificial intelligence technique.The QANN consists of a feedforward neural network, which was trained upon the 70% of the available training data by minimizing the PS using a particle-swarm optimization algorithm.The hyper-parameter optimization was performed on the remaining 30% of the available training data, reserved for validation.A dedicated neural network was trained for each predictive quantile level, in order to improve the performances.The QANN was performed by the neural network toolbox in MATLAB.

Gradient Boosting Regression Trees
A Gradient-Boosting Regression Tree (GBRT) benchmark, which was added due to the great performances showed in the winner methods during the GEFCOM2014.Also, in this case a dedicated model was trained for each quantile level.The GBRT was developed using the gbm package in R [29].

Bayesian Method
A Bayesian (BAY) benchmark was adapted from the methods presented in [20,30], in order to suit the forecasting scheme of the GEFCOM2014.In particular, the underlying deterministic model selected to forecast the expected values of the posterior predictive distributions consists of an average of a GBRT model and of a RF model; exogenous time series approaches indeed performed quite poorly, due to the monthly forecast horizons.The BAY benchmark is an hybrid parametric model, and it was specifically added in the comparative analysis in order to also provide a parametric reference for the results.

Numerical Application
The strategies for combining individual probabilistic forecasts are quantitatively assessed in this Section, using actual PV power data provided in the context of an energy forecasting competition [16].First, we present the data used for the numerical experiments and the accuracy of the results of individual forecasts; later we assess the accuracy of the forecast combination strategies.The PS values are used to quantitatively estimate the forecast performances [16,23].

Characteristics of the Data
The PV power data refers to three zones, which are geographically correlated; each time series was collected in a time interval ranging from April 2012 to June 2014.For reproducibility, we follow the same division kept by the organizers of the competition: the first year of data (April 2012-March 2013) was used only for training the underlying models; each of the remaining 15 months (April 2013-June 2014) constitutes a forecasting task.In order to improve the performances of the underlying models and of the forecast combination, and to maintain consistency between the outcomes of different forecast approaches, we selected a 1-year constant-length window for training underlying models at each task; the window shifts towards the most recent task.
The forecast combination is trained upon different numbers of tasks (i.e., using a different hyper-parameter L).Also in this case, once the hyper-parameter L is iteratively assigned, the time window used for training the combination weights has a constant length, and it shifts towards the most recent task.We reserve the last 5 tasks (February 2014-June 2014) to test the out-of-sample performances.
Table 1 shows the main statistical properties (mean, median, and variance) of the three PV datasets considered, as a whole.Note that all of the data provided by the competition organized are normalized.More details can be found in [16].

Assessment of the Accuracy of Individual Forecasts
We investigated the accuracy of the individual forecasts in all of the 15 tasks.The results during the last 5 tasks were also used as benchmarks, to compare the performances of the forecast combination approaches in the test step.Figures 2-4 show the plots of the PSs obtained using the QKNN, the QRF, and the QR, to the 15 considered tasks, for the zones 1, 2, and 3, respectively.The benchmark PS values of the QANN, of the GBRT, of the BAY, and of the NB are also shown as a reference.Table 2 shows the PS values of the individual forecasts and of the NB forecasts, averaged over tasks 11-15 (i.e., the tasks reserved for comparing the out-of-sample combination results).The PS values in Table 2 confirm the considerations made on the basis of the graphical inspection of Figures 2-4.Table 2 shows the PS values of the individual forecasts and of the NB forecasts, averaged over tasks 11-15 (i.e., the tasks reserved for comparing the out-of-sample combination results).The PS values in Table 2 confirm the considerations made on the basis of the graphical inspection of Figures 2-4.Table 2 shows the PS values of the individual forecasts and of the NB forecasts, averaged over tasks 11-15 (i.e., the tasks reserved for comparing the out-of-sample combination results).The PS values in Table 2 confirm the considerations made on the basis of the graphical inspection of Figures 2-4.Figures 2-4 clearly highlight the superior performances of QRF, QR, GBRT, and BAY models, with respect to the other models.For zone 1, QRF and QR perform very similarly, whereas for zones 2 and 3 the QR on average outperforms the QRF.The GBRT benchmark exhibits performances, Energies 2019, 12, 1011 11 of 16 on average, slightly worse than the QRF and QR; however, it outperforms the QANN and the QKNN for all of the zones considered.The BAY benchmark is, on average, slightly worse than the QRF, QR, and GBRT, whereas it outperforms QKNN, QANN, and NB.
Table 2 shows the PS values of the individual forecasts and of the NB forecasts, averaged over tasks 11-15 (i.e., the tasks reserved for comparing the out-of-sample combination results).The PS values in Table 2 confirm the considerations made on the basis of the graphical inspection of Figures 2-4.PQWS2, PQWS3, HQWS2, HQWS3, PCQWS2, PCQWS3, HCQWS2, HCQWS3, PQWSLR, HQWSLR, PQWSRR, and HQWSRR forecasts are analyzed in this sub-Section.Different values of hyper-parameter L (i.e., the length of the dataset used to train the weights of the combination models) are considered separately; in particular, they cover the 1, 2, . . ., 10 most recent tasks.
These figures clearly highlight that the PQWS3 outperforms the unconstrained, non-regularized strategies for all of the tasks considered.For zone 1, the HCQWS3 outperforms the other constrained strategies, whereas PCQWS3 performs better than the other constrained strategy for zone 3. Note, however, that the constrained strategies, compared to the unconstrained and regularized strategies, have quite similar results for zones 1 and 2, whereas the constrained strategies are definitely less accurate than the unconstrained and the regularized strategies for zone 3.
The smallest error score among all of the options considered is obtained for the zones 1 and 2 by using the HQWSLR with 7 tasks in the individual forecast dataset, whereas the best forecasts among all of the options considered for zone 3 are obtained through the PQWS3 with 6 tasks in the individual forecast dataset.
The trends of the PS of the combined forecasts are quite similar, as the performances significantly increase by using more than four tasks to form the individual forecast dataset; this improvement is at maximum around 6-8 tasks, and it slightly decreases with more tasks.
Altogether, the PQWS approaches outperform the HQWS ones, thus a more general model works better than a model with too much differentiation in the unconstrained, non-regularized estimation.Things change when the weight estimation is subject to constraints or to regularization; the hourly differentiation improves the performances of the forecasting ensemble for zones 1 and 2. models) are considered separately; in particular, they cover the 1, 2, …, 10 most recent tasks.
Figures 5-7 show, for zones 1, 2, and 3, the PS values of the forecasts to the number of tasks considered to form the individual forecast dataset.In particular,    We now analyze in detail the best competitors for the three zones: the HQWSLR for zone 1 and 2, and the PQWS3 for zone 3.In particular, for both strategies we consider only the best-case scenario and the worst-case scenario, in terms of number of tasks in the individual forecast dataset.For the PQWS3, the optimal number of tasks selected to train upon the individual forecast datasets was 6, 8, and 6, for zones 1, 2, and 3, respectively; the worst performances of the PQWS3 were instead obtained with 2, 1, and 2 tasks for zones 1, 2, and 3, respectively.For the HQWSLR, the optimal number of tasks selected to train upon the individual forecast datasets was 7 for all of the zones; the worst performances of the HQWSLR were instead obtained with 2, 1, and 1 tasks for zones 1, 2, and 3, respectively.
The comprehensive Table 3 compares the corresponding PS values of these best-and worst-case scenarios to the ones obtained for the individual forecasts and for the benchmark (see Table 2).It is evident from these results that the forecast combination, either through the PQWS3 or the HQWSLR, always improves the skill of the forecasts.
We quantitatively assessed the results of the PQWS3 and of the HQWSLR by comparing them to the most competitive benchmarks for each zone, which are the QRF for the zone 1, and the QR for the zones 2 and 3.In the best-case scenario, the PS obtained through the PQWS3 is about 9%, 3.5%, and 6.5% smaller than the corresponding PS of the most competitive benchmark for zones 1, 2, and 3, respectively; in the worst-case scenario, the PS obtained through the PQWS3 is about 6.5%, 2%, and 5% smaller the corresponding PS of the most competitive benchmark for zones 1, 2, and 3, respectively.The improvement of the HQWSLR towards the most competitive benchmark for each zone is instead about 9%, 4.5%, and 6.5% in the best-case scenario, and about 7.5%, 2.5%, and 4.5% in the worst-case scenario, for zones 1, 2, and 3 respectively.These figures clearly highlight that the PQWS3 outperforms the unconstrained, non-regularized strategies for all of the tasks considered.For zone 1, the HCQWS3 outperforms the other constrained strategies, whereas PCQWS3 performs better than the other constrained strategy for zone 3. Note, however, that the constrained strategies, compared to the unconstrained and regularized strategies, have quite similar results for zones 1 and 2, whereas the constrained strategies are definitely less accurate than the unconstrained and the regularized strategies for zone 3.
The smallest error score among all of the options considered is obtained for the zones 1 and 2 by using the HQWSLR with 7 tasks in the individual forecast dataset, whereas the best forecasts among all of the options considered for zone 3 are obtained through the PQWS3 with 6 tasks in the individual forecast dataset.
The trends of the PS of the combined forecasts are quite similar, as the performances significantly increase by using more than four tasks to form the individual forecast dataset; this improvement is at maximum around 6-8 tasks, and it slightly decreases with more tasks.
Altogether, the PQWS approaches outperform the HQWS ones, thus a more general model works better than a model with too much differentiation in the unconstrained, non-regularized estimation.Things change when the weight estimation is subject to constraints or to regularization; the hourly differentiation improves the performances of the forecasting ensemble for zones 1 and 2.   We now analyze in detail the best competitors for the three zones: the HQWSLR for zone 1 and 2, and the PQWS3 for zone 3.In particular, for both strategies we consider only the best-case scenario and the worst-case scenario, in terms of number of tasks in the individual forecast dataset.For the PQWS3, the optimal number of tasks selected to train upon the individual forecast datasets was 6, 8, and 6, for zones 1, 2, and 3, respectively; the worst performances of the PQWS3 were instead obtained with 2, 1, and 2 tasks for zones 1, 2, and 3, respectively.For the HQWSLR, the optimal number of tasks selected to train upon the individual forecast datasets was 7 for all of the zones; the worst performances of the HQWSLR were instead obtained with 2, 1, and 1 tasks for zones 1, 2, and 3, respectively.
The comprehensive Table 3 compares the corresponding PS values of these best-and worst-case scenarios to the ones obtained for the individual forecasts and for the benchmark (see Table 2).It is evident from these results that the forecast combination, either through the PQWS3 or the HQWSLR, always improves the skill of the forecasts.
We quantitatively assessed the results of the PQWS3 and of the HQWSLR by comparing them to the most competitive benchmarks for each zone, which are the QRF for the zone 1, and the QR for the zones 2 and 3.In the best-case scenario, the PS obtained through the PQWS3 is about 9%, 3.5%, and 6.5% smaller than the corresponding PS of the most competitive benchmark for zones 1, 2, and 3, respectively; in the worst-case scenario, the PS obtained through the PQWS3 is about 6.5%, 2%, and 5% smaller the corresponding PS of the most competitive benchmark for zones 1, 2, and 3, respectively.The improvement of the HQWSLR towards the most competitive benchmark for each zone is instead about 9%, 4.5%, and 6.5% in the best-case scenario, and about 7.5%, 2.5%, and 4.5% in the worst-case scenario, for zones 1, 2, and 3 respectively.

Conclusions
This paper discusses several strategies that have been developed to combine individual probabilistic PV power forecasts, aimed at building combined forecasts which are more accurate than individual predictions.Several types of combination strategies and architectures were developed in a competitive ensemble framework; all of them are based on the weighted quantile combination.The proposal was validated through numerical experiments based on PV power data published during the GEFCOM2014; several benchmarks are also presented, in order to compare the results.
The comparison of different forecast strategies for three different generation zones suggests that: • the weighted quantile combination was effective in improving the accuracy of forecasts; it is able to outperform the accuracy of individual probabilistic forecasts, which is the main aim of competitive ensemble methods.

•
The forecast combination improved the skill of the forecasts in all of the scenarios considered, with a reduction in terms of PS that is up to 9%.

•
On average, the best results were obtained using the HQWSLR combination strategy for zones 1 and 2, and the PQWS3 combination strategy for the zone 3; the optimal length of the dataset used to train the weights of the combination models always ranges between 6 and 8 tasks.

•
Adding the forecasts of an individual model which has worse performances than the other individual models appears to provide useful diversity in the ensemble approach; this appears to be valid both for unconstrained, non-regularized strategies and for constrained strategies.
• Adding too much dispersion to the forecast combination by estimating weights for each hour of the day does not improve the quality of the results for unconstrained, non-regularized regression; vice versa, constraints and/or regularization allow taking benefit from this hourly differentiation.

Figure 1 .
Figure 1.Overview of the proposed competitive ensemble method for forecasting photovoltaic (PV) power.

Figure 1 .
Figure 1.Overview of the proposed competitive ensemble method for forecasting photovoltaic (PV) power.
(h) t+k = 1 if the forecast horizon t + k is the hth hour of the day, and hod (h) t+k = 0 otherwise.The weights are estimated from:

Figure 2 .
Figure 2. Pinball Score values of individual forecasts for zone 1.

Figure 3 .
Figure 3. Pinball Score values of individual forecasts for zone 2.

Figure 4 .
Figure 4. Pinball Score values of individual forecasts for zone 3.

Figure 3 .
Figure 3. Pinball Score values of individual forecasts for zone 2.

Figure 4 .
Figure 4. Pinball Score values of individual forecasts for zone 3.

Figure 3 .
Figure 3. Pinball Score values of individual forecasts for zone 2.

Figure 4 .
Figure 4. Pinball Score values of individual forecasts for zone 3.

Figure 4 .
Figure 4. Pinball Score values of individual forecasts for zone 3.

Table 1 .
Statistical properties of the PV data considered.

Table 3 .
Pinball Score values averaged over the tasks 11-15.Bold values highlight the best results for each zone.