Improving Wind Power Forecasts: Combination through Multivariate Dimension Reduction Techniques

: Wind energy and wind power forecast errors have a direct impact on operational decision problems involved in the integration of this form of energy into the electricity system. As the relationship between wind and the generated power is highly nonlinear and time-varying, and given the increasing number of available forecasting techniques, it is possible to use alternative models to obtain more than one prediction for the same hour and forecast horizon. To increase forecast accuracy, it is possible to combine the different predictions to obtain a better one or to dynamically select the best one in each time period. Hybrid alternatives based on combining a few selected forecasts can be considered when the number of models is large. One of the most popular ways to combine forecasts is to estimate the coefﬁcients of each prediction model based on its past forecast errors. As an alternative, we propose using multivariate reduction techniques and Markov chain models to combine forecasts. The combination is thus not directly based on the forecast errors. We show that the proposed combination strategies based on dimension reduction techniques provide competitive forecasting results in terms of the Mean Square Error.


Introduction
Wind power, measured by its installed capacity, has soared in recent years driven by technological development. This trend will continue as many countries have ambitious objectives in wind power. As wind power is intermittent by nature and is a non-dispatchable form of energy, wind power forecasts are needed for large-scale integration of wind turbines into the electrical grid. There have been several attempts to produce prediction tools to forecast wind energy. For recent reviews of the literature about wind power forecasting see, for instance, [1][2][3][4]. For a general overview of energy forecasting (load, electricity prices, wind and solar energy), see Hong et al. [5].
As the share of wind energy in the energy mix increases, wind power and wind power forecast errors (that is, the difference between the predicted value and the real production) can adversely impact the operation of the power system [6]. Thus, the operational and economic impact of wind power forecast errors has been analyzed in recent years. For instance, wind power forecast error modeling has a significant impact on wind integration planning studies [7]. Forecast errors of wind and load predictions are also an important factor to consider when determining spinning reserve requirements [8]. There is also abundant research on the impact of wind power forecast in the unit commitment and economic dispatch problem [9].
Even though the benefits of improving wind power forecasts are clear, only a few works deal with forecast combination techniques applied to wind energy. As the relationship between wind and the generated wind power is highly non-linear and time-varying, and given the variety of models available, it is possible to use alternative models based where y t+h|t = (y 1,t+h|t , · · · , y N,t+h|t ) is the N dimensional vector of the h-period-ahead forecasts at time t and β t (h) = (β 1t (h), . . . , β Nt (h)) is the vector of weighting factors. This vector is calculated for any time step t and forecast horizon h. When necessary, an intercept can be added to Equation (1) to correct a possible bias in the final prediction.
One of the most common ways to estimate the weighting factors is based on the evaluation of past errors of each of the original competing forecasts (see [15,18,19]). Sanchéz proposes a two-step adaptive forecast combination procedure to compute alternative combinations, based on a recursive estimation of the mean squared prediction errors [15]. Nielsen et al. use a mean square criterion, and the weights depend on the underlying mean and covariance of the individual forecast errors. The combination can be sub-optimal as mean and covariance have to be estimated. Moreover, this method occasionally produces combined forecasts outside the range of power production that is feasible [18]. Thordarson et al. reformulate the classical regression model for combining forecasts in order to impose restrictions on the combination weights. The weights can be a non-parametric function of particular meteorological variables. The authors maintain that it is not possible to generalize the results because the meteorological variables used may depend on local climatology or on the type of meteorological model used to generate the forecasts [19]. Recently [20], three different forecast models have been combined applying a constrained least squares regression method (which minimizes the sum of squared errors). The weighting factors are constant (i.e., they are estimated only once), so the combination does not have a good performance at all times t. Forecast combination techniques are also used to solve a different problem, the so-called forecast reconciliation problem (that the sum of the individual wind power forecasts of different wind farms should be equal to the forecast made for the whole set) [21].
At any time t, we can calculateê i,t|t−h = p t − y i,t|t−h with i = 1, . . . , N, which are the forecast errors computed as the difference between the wind energy generated at time t and its estimation made at t − h by forecaster i. As h increases, the relation between what happens at t and what we estimated at t − h becomes weaker. Moreover, as h increases, the relation between the prediction of wind energy for t + h and the information we have at t, that is, with the forecast errorê t|t−h is even weaker. Because of this, we do not compute the weights β t (h) in Equation (1) based on the forecast errors, but rather on capturing the directions that better summarize the information contained in the set of forecasts for each forecast horizon h. The rationale behind multivariate dimension reduction techniques is that it summarizes the information of the initial set of N forecasts to produce a single prediction. There are two groups of dimension-reduction techniques: those that use the information of the variable to forecast when reducing the dimension (supervised learning) and those that do not use it (unsupervised learning). Within the first category, we consider partial least squares (PLS). Another alternative is sliced inverse regression, but it does not perform well for forecast combinations [22]. Within unsupervised learning techniques, we consider principal components (PC) for combining forecasts as in [22]. Later on, PC was applied in the context of interval predictions for electricity price and load [23]. Dynamic factor models decompose the information in a data set (in our case, the different forecasts) into a common and an idiosyncratic part. The common factors would be the part of the forecast shared by all the predictors, while the idiosyncratic part is specific to each predictor and characterizes the disagreement between them. In principle, using the common factors as the linear combinations of the forecasts can give good results as PC for forecast combination [22]. However, PC is preferred as it is a much simpler technique. Other methods, like, for example, Sparse PC, are used only when the number of forecasts is big enough and some of them are discarded, imposing some regularization when reducing the dimension. We do not contemplate the use of independent component analysis (ICA) or non-linear PC as very simple combination schemes (as the simple average of the forecasts) are so far very difficult to beat. This work is based on Poncela et al. [22], considering a different problem and several forecast horizons instead of only one-step-ahead forecasts. Finally, we have also applied another forecast combination strategy assuming a Markov chain to estimate the probabilities of each model forecast to be the best in each time step and forecast horizon [24]. A hybrid approach combining a Bayesian model averaging with ensemble learning was applied recently to the wind-forecasting problem [25], estimating constant weighting factors for each forecast model. In our case, the weighting coefficients of each prediction model are updated at every time step t based on a forgetting factor to highlight the latest real data.
The structure of the paper is as follows: Section 2 introduces the initial set of forecasting models, Section 3 describes the forecast combination procedures we applied in this work; Section 4 describes the data, Section 5 shows the results and, finally, the conclusions are provided in Section 6.

Initial Set of Competing Forecasters
The initial set of forecasts is as follows: NR model: this is a very simple autoregressive model that produces very good results at short forecast horizons, and it is useful in case the input from the meteorological model does not arrive to feed the other models. It is given by where a h is defined as the correlation coefficient between p t+h and p t , and p t is the average of the wind energy. The remaining forecasts come from the time-varying models written in state-space form and optimized in [26]. The state-space model is made up of two equations. The observation equation is defined as: where y t = p t+h is the wind energy to be forecast at time t, h ∈ N + with h = 1, . . . , 36 is the forecast horizon; A t is the observation matrix, x t = [x 0t , x 1t , · · · x mt ] is the state vector; and e t is the error term with 0 mean and associated variance R.
The state equation is defined as: where Φ is the transition matrix and ξ t is the error term with 0 mean and associated variance Q. The state vector at time 0 is denoted by We employ a set of models of increasing complexity. The observation matrix A t contains the inputs and depends on each specific model. In what follows, we include a brief description of the alternative models: • M1 model: the observation matrix contains only information of wind power generation in previous periods up to time t: The term p t+h−c attempts to capture the daily cycle. If h < 24 − k, then c = 24 and p t+h−c is the wind power generated 24 hours before the hour the model is predicting. If h > 24 − k, then c = h + k. For Sotavento wind farm, k = 3.

•
M2, M3 and M4 models: these models incorporate the available meteorological wind speed forecast, trying to reproduce the different zones of the wind power curve (relationship between wind and generated energy) as follows: • M5, M6 and M7 models: these models consider the available meteorological wind speed and wind direction forecast as follows: For each model M1, M2, . . . , M7 the size of A (Mi) t is different, as it is the size of the vectors x t , and w t and the matrices Q, R and Φ. R is the variance-covariance matrix of the errors of the observation equation, and Q and Φ characterizes the behavior of the time-varying parameters of the models collected in the state vectors x t . Diagonal matrices reflect that the parameters do not exhibit any linear relation. A full matrix Φ indicates that the time-varying parameters depend on their previous states as well as those of the remaining parameters, and a full matrix Q reflects that the parameters are related among them although the correlation enters into the model through the noise covariance matrix. With optimized parameters for each wind farm, we adapt the models to each site. The fitting (learning) of the models is done by estimating the unknown hyperparameters of the model, α = (x 0 , P 0 , Q, R, Φ) by maximum likelihood through the Expectation-Maximization algorithm.
Note that we estimate a different model for each forecast horizon h. There are two possibilities to produce h-period-ahead forecasts: direct or iterated forecasts. Direct forecasts are computed using one model for each forecast horizon h, while iterated forecasts are computed from a single model estimated for one-period-ahead forecasts and then iterated forward. There is no consensus about which approach should be preferred, and choosing direct versus iterated approaches is an empirical matter [27]. Theoretically, if we knew the true model, the iterated procedure should produce more efficient forecasts, but direct forecasts are more robust to model misspecification. In the context of forecast combination, we are facing a more realistic situation where the true model is not known and, therefore, there is a set of models to generate forecasts. For wind power, the authors of [26] found that direct forecasts work better for all the wind farms tested.

Multivariate Dimension Reduction Techniques
The rationale behind multivariate dimension reduction techniques is to summarize the information of the initial set of N forecasts to produce a single prediction. The final prediction is built on a two-step procedure. In the first step, we build r linear combinations (factors) of the initial set of forecasts as follows: where w jt (h) is the vector of weighting factors and y t+h|t = p nr t+h|t ,p M1 t+h|t , · · · ,p M7 t+h|t is the vector of h-period-ahead forecasts defined in Section 2.
In the second step, we obtain the final prediction as: p c,t+h|t =β 0t +β 1t f 1,t+h|t + · · · +β rt f r,t+h|t (13) whereβ t = β 0t , . . .β rt are the Ordinary Least Squares (OLS) coefficients calculated with the regression where p t is the wind energy at time step t,ŷ c,t+h|t is a real ex-ante forecast as the coefficients β jt are estimated based on data up to t. Note that we calculate two different weighting vectors: the first one, w t (h), to compute f jt (linear combinations of the initial set of forecasts), and the second vector, β t (h), to combine the factors and produce the final forecast. As the input set (the competing forecasts) are predictions of the same variable, theycould present a high correlation. In this sense, the multivariate reduction dimension techniques extract, in a feasible way, the common information they contain. We consider two different multivariate dimension reduction techniques to calculate the r factors ( f 1t , . . . , f rt ): partial least squares (PLS) and principal components (PC). The difference between the two techniques is based on the information set used to calculate the factors and the method employed to compute the weighting vector w t (h). PLS considers the information of the variable to predict p t , whereas PC does not take into account this information.
This means that, in the first case (PLS),ŵ jt (h) =ŵ j p h , y h|0 , p h+1 , y h+1|1 , . . . , p t , y t|t−h and in the second case (PC),ŵ jt (h) =ŵ j y h|0 , y h+1|1 , . . . , y t|t−h . The principal components find linear combinations (factors) that explain the variance among forecasts, var(Y), being Y the matrix that collects all the forecasts while partial least squares look for linear combinations in the direction of cov(Y, p) being p the vector of observations. The PC factors summarize the variability among the initial set of forecasts, while the PLS factors are linear combinations of the initial forecasts that are useful for predicting the wind energy. In what follows, we briefly summarize each technique.

Principal Components
Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert an initial set of correlated variables into a new set of uncorrelated variables called principal components. The number of principal components retained should be less than or equal to the number of original variables. The principal components have a decreasing variance value and are orthogonal among them.
Let Y 1 , . . . , Y N be the initial set of variables and Σ Y its covariance matrix. The ordered principal components are the successive linear combinations where w i is the eigenvector with modulus 1 associated with the i-th eigenvalue of Σ Y . In our problem, Y contains the alternative forecasts for wind energy, y t+h|h = (p nr t+h|t ,p M1 t+h|t , · · · ,p M7 t+h|t ) . In the second step of the procedure, we compute the final predictionŷ c,t+h|t with the first r principal components or factors, f t = ( f 1,t , . . . , f r,t ) and the regression coefficients obtained with Equation (14).

Partial Least Squares
A PLS model tries to find the multidimensional direction in the input variable space (Y) that explains the maximum multidimensional variance direction in the output variable space p [28].
The computation is a sequential process where the correlation of the variable to predict (up to time step t for obtaining true ex-ante predictions) and the initial set of forecasts are used to compute the factors. Let y t|t−h = y 1,t|t−h , . . . , y N,t|t−h be the vector of h-periodahead forecasts of the observed variable p t . The first factor f 1t , is obtained projecting the covariance between the variable to predict and its forecasts, in the direction of the forecasts, where α is a proportionality constant that depends on the selected normalization. Let , i = 1, . . . , N the residuals of the N regressions of the variables y i,t|t−h over f 1t . To compute the second factor, we use this information, orthogonal to the first factor, and we build the following new linear combination, The following component or factor uses the information orthogonal to the previous factors. The procedure is repeated until all factors have been calculated.

Dynamic Model Averaging
This method was designed to deal with real-time forecasting when there is uncertainty about which will be the best forecast at time t [24]. It combines the initial set of forecasts with a Markov Chain to select the best forecast. The Markov Chain is estimated recursively with a two-step process for each time step t. Let M 1 , · · · , M N be the individual Nmodels, being L t = k if the real process is driven by model M k at t. The shift from one prediction model to another is determined by a Markov chain with transition matrix T(h) = (T kl ) with T kl (h) = P[ L t+h = l|L t = k], being h the forecast horizon. Letting π t|t,l = P[ L t = l|P t ] with P t = (p 1 , · · · , p t ), the prediction equation of the Markov chain probabilities is To avoid the need to explicitly estimate the transition matrix T, the proposed algorithm is a recursive procedure where the estimation of the probabilities is based on a forgetting factor. Equation (17) is calculated as α is the forgetting factor, whose value is slightly less than one, and c is a small value introduced to avoid numerical instabilities. The updating equation of the Markov chain is: with w t+h,l = π t+h|t,l f l ( y t+h |P t ) (20) being f l ( p t+h |P t ) ∼ N(A t x t , R t ) the normal distribution density function with information up to time t, obtained from Equations (3) and (4). The final forecast combination is, The process is repeated with each new observation, with π 0|0,l = 1/N for l = 1, · · · , N.
We have applied the same forgetting factor value for all forecast horizons, equal to 0.999.

Data
We used data from the Sotavento wind farm to apply the proposed methodology. Sotavento is located in Galicia, Spain [29] and presents a complex orography, which makes wind speed prediction and hence wind power forecast difficult. Sotavento is an experimental wind farm: instead of having identical wind turbines, it consists of 24 wind turbines using 5 different technologies (asynchronous two speeds generator and fixed blade pitch, asynchronous two speeds generator and variable blade pitch, doubly fed induction single-speed generator with variable blade pitch, double fed induction singlespeed generator with fixed blade pitch and synchronous generator with variable speed and variable blade pitch), with the objective to be a "showcase" wind farm. Its nominal power is 17.56 MW, and the estimated annual production is 38,500 MWh/year. We have historical measurements of wind power generation and meteorological wind speed and direction forecasts from an numerical weather prediction (NWP) model. The meteorological predictions are daily forecasts of the Advanced Regional Prediction System (ARPS) model at 10 km resolution on its 00Z execution, providing a forecast for the following 72 h. In Sotavento, historical wind power data are obtained every 10 min and then averaged each hour. The data set extends from 17/02/2004 1:00 until 28/04/2005 8:00 h, containing 10471 values. Data are initially divided into two sets. The first year of data, which amount to 8760 data points, were used to estimate and optimize the models. With the remaining data, (1711 points), the first 100 were used for the initial estimation of the covariance matrices of the initial set of forecasters and the remaining 1611 measurements to check the performance of the different forecast combination schemes and to validate the results.
As is usual in the field, power data were divided by the installed capacity and wind speed data were divided by the maximum speed. This produces forecast results that are independent of the wind farm size, and thus we can compare the forecasting performance among different analyses.

Results and Discussion
This section shows the results of the combination strategies tested and compares them using several measures of forecast accuracy. As pointed out in [30], the forecasts can be ranked depending on the measure of forecast accuracy chosen. We compare our results according to the following criteria: bias, root mean squared error (RMSE), mean absolute error (MAE) and quantile score (QP) for some quantiles (see [30] for a review of the most popular evaluation metrics for wind power forecasts). To compare the different alternatives for forecast combination, we use the Diebold and Mariano tests for forecast accuracy [31]. In the calculation of the bias, some positive errors are canceled out by negative errors. In terms of the RMSE, all errors are taking into account. Moreover, larger errors have a proportionally greater impact on the RMSE as they are squared. In the electricity system, large wind power forecast errors have a large impact on the system operation and could cause, at certain moments, concerns about the security of supply. As an example, during the cold spell in January 2017, Elia, the transmission system operator in Belgium, had to emit an alert through the European Awareness System and declare that emergency exchanges towards France and the Netherlands were unavailable. This was caused by large errors in their forecast of wind and solar power production (see [32] for more details). Large forecast errors can lead to congestions and inefficient management of power flows. Moreover, forecast errors have to be compensated to keep the system balanced at any time. Typically, imbalances are solved by conventional generators until more storage and demand response will be deployed. It is well known that constant variations of the set point of conventional power plants lower their efficiency, resulting in an increase in the emission factors per MWh and increased maintenance costs due to wear and tear. It is also well known that the maintenance costs of hydro turbines increase when subject to changes in the set points used to balance the electrical grid. In summary, from the network operator perspective and from the environmental point of view, large forecast errors have a larger impact than smaller errors, which can be easily sorted out by netting imbalances. Because of this, we will use the RMSE as the main indicator to compare the results.
For an optimal combination, we do not use an aprioristic number of components or factors. Instead, we carried out the analysis with 1 to 4 components and, a posteriori, we estimated the optimum number in terms of the chosen metric. We kept the number of factors low (from 1 to 4) for several reasons. First, the set of forecasters is not big. Second, each additional factor provides less information. Third, the uncertainty associated with estimating a larger number of components for the weighting vectors can cancel out the positive effect of introducing a new factor.
All numerical results are provided as Supplementary Material in an Excel file. The set of initial predictors comprises the NR model defined in Equation (2) as well as the multivariate models M1 to M7.

Benchmark Combinations
Simple benchmark combination techniques were used to compare the results obtained with the proposed schemes. The benchmark models are very simple procedures that, at the same time, present very good practical results. In fact, one of them, the average of forecasts, is hard to beat, a finding that constitutes the "forecast combination puzzle" in the forecasting literature.
The benchmark techniques are: (i) the mean of the initial set of forecasters, (ii) the median and (iii) the ordinary least square (OLS) regression with and without independent term, y c (t + h|t) = α 0t + X× β t and y c (t + h|t) = X× β t , respectively.

Letŷ (k)
t|t−h be the point forecast for wind energy at time t computed at t − h and y t the observations, with t = 1, 2, . . . , T; k denotes that we have different forecasts (the initial set, and the results of the different combination strategies). The bias, defined as measures the ability of a procedure to capture the average level of the target process. The biases for the initial set of forecasts and the combination strategies, are shown in Figure 1. Notice that power data have been divided by the installed capacity of the wind farm, so all values are between 0 and 1. As can be observed in Figure 1, there is no evidence of bias problems in the forecast results. The Supplementary Materials contains the numerical results. (ii) benchmark combination strategies: media, median, ordinary least squares with intercept (denoted as ols) and without intercept (denoted as ols2) represented in green color; (iii) combination based on principal components with one, two, three and four factors, denoted as 1pc, 2pc, 3pc and 4pc, respectively, represented in red color; (iv) combination based on partial least squares with one, two, three and four factors, denoted as 1pls, 2pls, 3pls and 4pls, respectively, represented in blue color; (v) combination based on a Markov Chain denoted as DMA and represented in black color.

Root Mean Squared Error
The root mean squared error is defined as Larger errors have a proportionally greater impact on the RMSE as they are squared. This is an important indicator when the main objective is to manage the power grid in real time, i.e., from the network operator perspective. Figure 2 shows the results of the different combination strategies and the initial set of forecasts in terms of RMSE. It can be seen that the different combination strategies outperform, in terms of RMSE, the results of the initial forecasts. Although the benchmark combinations produce very good results, the combinations based on multivariate reduction techniques (principal components and partial least squares) show a better performance. The Supplementary Material File contains the numerical results. (ii) benchmark combination strategies: media, median, ordinary least squares with intercept (denoted as ols) and without intercept (denoted as ols2) represented in green color; (iii) combination based on principal components with one, two, three and four factors, denoted as 1pc, 2pc, 3pc and 4pc, respectively, represented in red color; (iv) combination based on partial least squares with one, two, three and four factors, denoted as 1pls, 2pls, 3pls and 4pls, respectively, represented in blue color; (v) combination based on a Markov Chain denoted as DMA and represented in black color.

Mean Absolute Error
The mean absolute error (MAE) is computed as Figure 3 represents the results in terms of MAE. All the combination strategies present better results than the initial set of forecasts. As the forecast horizon increases, the proposed combination strategies (based on principal components, partial least squares and dynamic moving averaging based on a Markow chain) beat the benchmark strategies. and M7 models) represented in grey color; NR has not been included in the figure due to its large errors; (ii) benchmark combination strategies: media, median, ordinary least squares with intercept (denoted as ols) and without intercept (denoted as ols2) represented in green color; (iii) combination based on principal components with one, two, three and four factors, denoted as 1pc, 2pc, 3pc and 4pc, respectively, represented in red color; (iv) combination based on partial least squares with one, two, three and four factors, denoted as 1pls, 2pls, 3pls and 4pls, respectively, represented in blue color; (v) combination based on a Markov Chain denoted as DMA and represented in black color.

Quantile Scores
Quantile scores are useful when the positive and negative errors have distinct consequences. A different weight is assigned depending on whether the error is positive or negative.
with 0 < p < 1 being the weighting factor for positive errors.  and M7 models) represented in grey color; NR has not been included in the figure due to its large error; (ii) benchmark combination strategies: media, median, ordinary least squares with intercept (denoted as ols) and without intercept (denoted as ols2) represented in green color; (iii) combination based on principal components with one, two, three and four factors, denoted as 1pc, 2pc, 3pc and 4pc, respectively, represented in red color; (iv) combination based on partial least squares with one, two, three and four factors, denoted as 1pls, 2pls, 3pls and 4pls, respectively, and represented in blue color; (v) combination based on a Markov Chain denoted as DMA and represented in black color. and M7 models) represented in grey color; NR has not been included in the figure due to its large bias); (ii) benchmark combination strategies: media, median, ordinary least squares with intercept (denoted as ols) and without intercept (denoted as ols2) represented in green color; (iii) combination based on principal components with one, two, three and four factors, denoted as 1pc, 2pc, 3pc and 4pc, respectively, and represented in red color; (iv) combination based on partial least squares with one, two, three and four factors, denoted as 1pls, 2pls, 3pls and 4pls, respectively, and represented in blue color; (v) combination based on a Markov Chain denoted as DMA and represented in black color.

Tests of Forecast Accuracy
To complete the exposition of results, we have performed the Diebold and Mariano test for forecast accuracy for the mean squared error (MSE). As we have mentioned before, this forecast accuracy measure is of primary interest for network operators.
Let e (1) t+h|t = p t+h −ŷ (1) t+h|t and e (2) t+h|t = p t+h −ŷ (2) t|t−h be the h-period-ahead forecast errors from two different models and L the loss function such that L p t+h ,ŷ t+h|t ). The loss differential made at each period t is given by The Diebold-Mariano test statistic is computed as d t andLRV d are used to estimate of variance of √ Td as follows: where the number of lags m used to compute the long-run variance is large enough to take into account the autocorrelation structure of the errors for each forecast horizon. We summarize the results of the multivariate dimension techniques (PLS and PC) versus OLS with intercept, since this is the most challenging alternative in terms of the MSE according to the results shown in Figure 2. We computed the results for two different sizes of the training sample used to estimate the initial weights of the forecast comparison, To = 100 (see Appendix A) and To = 1000. In the first case, we leave more data points to increase the significance of the results. For the sake of brevity, we present here a summary of the results for the first case.
Let e (1) t+h|t be h-step-ahead forecast errors from the combination of forecasts performed with OLS and e (2) t+h|t be the h-step-ahead forecast errors from the different forecast combination techniques. Positive values of the test statistic indicate that OLS performs worse than the alternative forecast model. Only for h = 5 does OLS yield better results than the best multivariate alternatives. For the remaining forecast horizons, there is always a multivariate dimension technique that outperforms OLS. As our aim is to check if dimension reduction techniques outperform OLS for forecast combination, we compute the p-values for the one side test. Statistical significance in favor of the multivariate techniques is found for medium and longer horizons although, for instance, for h = 1, PLS with 2 components yields a p-value of 0.045. Full results are given in Appendix A and in the Supplementary Material File.

Conclusions
Wind energy is increasing and gaining importance in electricity generation around the world. Its integration and reliability are essential to ensure the continuity of the supply of electricity. Precise wind energy forecasts are key to efficiently manage the grid. Currently, there are many forecast models available to grid operators or wind farm managers. Due to this plethora of forecast techniques, we have applied different forecast combination strategies based on dimension reduction techniques and Markov chain models, to improve the performance of the final prediction.
One of the most commonly used strategies to combine forecasts is to compute the weighting factors of each individual predictor as a function of its forecast errors: that is, for each forecast horizon h, model i, at time period t, the weighting factor calculated to build the forecastŷ t+h|t is a function of e (i) ( t|t − h), e (i) ( t − 1|t − h − 1), · · · , e (i) ( h|0) . As h increases, the relationship between what was estimated at time t − h and what will happen at t + h could be weak, so this strategy may not be adequate for large forecast horizons, since the dynamics of the system may have changed (for example, the system could be in another meteorological regime, and the weighting factor estimated for t + h in t according to the forecast error done at t − h for the forecast at t could be inadequate).
The novelty of this work is that the weights of the initial forecasters are estimated adaptively and are not based on the forecast errors in previous time steps t. Multivariate reduction techniques search for the directions that best summarize the information content of the initial forecasters. In the case of the dynamic model averaging techniques, the combination strategy is based on a Markov chain to estimate the probability of each initial forecast being the best predictor for each time step t and forecast horizon h.
We have analyzed the results of the different combination strategies and compared their performance considering different approaches: bias, RMSE, MAE and quantile scores.
In terms of bias, all forecasts present good results, and it can be considered that all of them are unbiased (bias is less than 0.1% in all cases).
RMSE is important from the network operator perspective. Combination strategies based on multivariate dimension reduction techniques show better results than the initial set of forecasters and the benchmark combination strategy. Partial least squares offers better results than principal components.
To mimic the wind farm owner perspective, we have used MAE and quantile scores, which are among the most common indicators. MAE results show that the proposed combination strategies outperform the initial set of forecasts and most of the benchmark combination strategies in these cases too. Finally, the quantile score results depend on the weight assigned to positive and negative errors.
The combination strategy implemented is a two-step process. First, the input space is reduced, and second, the factors obtained in the first step are combined in a linear regression to obtain the final forecast. In the case of partial least squares, the factors maximize the covariance between the input space data (in our case the set of initial forecasts) and the variable to be predicted, while principal components maximizes the variance of the initial forecasters without taking into account the variable to be predicted. It is shown that the strategy that employs a more complete set of information presents better results. Concerning the multivariate dimension reduction techniques, we obtain better results increasing the number of factors from one or two factors to three or four factors as the prediction horizon increases, although it should be noted that the difference between three and four factors is minimal. This is because the uncertainty increases with the prediction horizon, and increasing the number of factors adds more information to the prediction albeit at the expense of more estimation uncertainty. When the latter is too large in relation to the new information the additional factor conveys, it does not compensate to add more factors to the combination.
In a nutshell, we have shown the added value of forecast combinations as a simple tool to lower forecast errors. The results obtained using simple combination strategies were promising. We analyzed several combination strategies based on multivariate reduction techniques and Markov chain and have shown that these strategies outperform the initial set of forecasts and the benchmark combination strategies.

Conflicts of Interest:
The authors declare no conflict of interest. The information and views set out in this publication are those of the authors and do not necessarily reflect the official opinion of the European Union. Neither the European Union institutions and bodies nor any person acting on their behalf may be held responsible for the use which may be made of the information contained therein.