PCA Forecast Averaging—Predicting Day-Ahead and Intraday Electricity Prices

: Recently, the development in combining point forecasts of electricity prices obtained with different length of calibration windows have provided an extremely efﬁcient and simple tool for improving predictive accuracy. However, the proposed methods are strongly dependent on expert knowledge and may not be directly transferred from one to another model or market. Hence, we consider a novel extension and propose to use principal component analysis (PCA) to automate the procedure of averaging over a rich pool of predictions. We apply PCA to a panel of over 650 point forecasts obtained for different calibration windows length. The robustness of the approach is evaluated with three different forecasting tasks, i.e., forecasting day-ahead prices, forecasting intraday ID3 prices one day in advance, and ﬁnally very short term forecasting of ID3 prices (i.e., six hours before delivery). The empirical results are compared using the Mean Absolute Error measure and Giacomini and White test for conditional predictive ability (CPA). The results indicate that PCA averaging not only yields signiﬁcantly more accurate forecasts than individual predictions but also outperforms other forecast averaging schemes


Introduction
In recent years, we have observed a dynamic transformation of energy markets, which encompasses changes in the generation structure and the creation of new trading opportunities. Since the establishment of competitive power exchanges, a growing share of electricity has been traded in day-ahead markets, where offers are placed before the noon of the day preceding the delivery. To give traders the opportunity to balance deviations from positions contracted in the day-ahead market (due to the highly unpredictable generation from renewable sources), the spot markets have been complemented by intraday and balancing markets. Operation in such a complex environment becomes challenging for many market participants, as it requires taking various operational decisions, for example, generators need to decide, how much electricity to offer on a day-ahead market see [1] or how to structure the intraday trade [2]. Therefore, an accurate prediction of electricity prices becomes an important issue for utility managers.
The literature is rich in publications focusing on modelling and forecasting of spot prices (see [3,4] for a comprehensive review). At the same time, there are few articles, which are dedicated to intraday markets [2,5,6]. Most of them focus on a very short term-a few hours ahead-forecast, as in [7]. These types of models could not be directly used by utilities when making operational decisions. hours and locations [30][31][32]. The results indicate that a joint exploration of the whole panel leads to more accurate short-and mid-term forecasts.
The PCA approach could be also used to combine forecasts obtained from different models or/and model specifications. Although the literature recognizes the potential of PCA in forecast averaging [33,34], there are few articles where the method is successfully applied. In [14,33,35] static factors are employed to extract information from a panel of predictions coming from different models/experts, in order to obtain point forecast of the chosen macroeconomic variables. PCA was also adopted by [36], who used the method for the construction of prediction intervals of electricity spot prices. In all the above applications, factors are estimated with relatively small and diversified panels.
In this paper, an alternative setup is explored, in which the panel of predictions is homogeneous and consists of a large number of forecasts based on the same model estimated with different calibration windows, as in [18][19][20]. This setup gives new challenges. First, the panel of forecasts is not balanced because it consists of a relatively small number of predictions calculated with short calibration windows. Second, forecasts based on long windows are almost identical, as the growing sample size gives more stable parameter estimates. What is more, different days are characterized by different patterns of relationship between panel forecasts and actual prices. In order to solve some of the above problems, we normalize the predictions across the window size dimension. Next, we use both the time dependent moments of panel variables and factor estimates to calculate the price forecast. It is shown, that PCA combined with data standardization could be a promising alternative for other weighting schemes. Moreover, the method does not require an ad hoc selection of the calibration window length. In this research, the number of factors used for forecast averaging is either fixed and chosen ad-hoc or is dynamically selected with BIC information criteria. The results indicate that in case of slightly misspecified models, the proposed PCA-based procedure significantly outperforms both best performing ex-post selected calibration window and weighted averaged windows (WAW) approach [19]. In particular, the errors obtained with the model when forecasting the German day-ahead prices are almost 4% lower in terms of MAE in comparison to the optimal single calibration window and significantly better than any other considered averaging scheme. Finally, the aggregated outcomes show that PCA(1) using only one factor is the best among different PCA specifications. The averaging scheme utilizing information criteria to choose the number of factors is only slightly worse than PCA(1) and gives more robust results. Therefore, we recommend to use information criteria, such as BIC, for the automated selection of the number of factors.
The remainder of the paper is structured as follows. In Section 2, we present the datasets illustrating the German electricity market. Section 3 describes the experiment design, introduces variance stabilizing transformation (VST) and defines models used for forecasting of day-ahead and intraday prices. Next, in Section 4, we discuss forecast averaging schemes and introduce PCA forecast combination approach. The performance of the methods is evaluated in Section 5. Finally, the conclusions of the research are presented in Section 6.

Datasets
In order to test the proposed methodology, we utilize a number of datasets from the German market-each of them spans from 1 January 2015 to 15 August 2019. We consider two different price time series: the day-ahead hourly electricity prices (top panel in Figure 1) and the corresponding time series linked to the intraday market-the ID3 index hourly prices (bottom panel in Figure 1). According to the official rules by EPEX SPOT [37], the ID3 index is calculated as the volume-weighted average price of all trades within 3 h before the delivery of the product (up to 30 min before delivery). Apart from the price series, we use data for different types of exogenous variables: the day-ahead consumption prognosis (top panel in Figure 2) as well as the day-ahead wind and solar generation forecasts (respectively: middle and bottom panels in Figure 2). The wind generation forecast consists of aggregated forecasts of offshore and onshore generation forecasts.
In the recent paper [38], authors argue that wind and photovoltaic generation forecasting errors increase the system imbalance in Germany and these directly influence electricity prices. Hence, we additionally use two other fundamental factors that impact electricity prices: natural gas spot prices (top panel in Figure 3) and the spot price of European carbon emission allowances, more precisely, EUA-Emission Unit Allowance (see bottom panel in Figure 3). Note that, in contrast to electricity prices and generation forecasts, EUA and natural gas spot prices are quoted in a daily (not hourly) resolution. The missing (corresponding to the time change in March) or 'doubled' (corresponding to the reversion to standard time in October) values were replaced by the arithmetic mean of the neighbouring observations for the missing ones and the arithmetic mean of both values for 'doubled' hours.
Descriptive statistics of the day-ahead and intraday price series are shown in Table 1. Although the mean of prices for both markets is very similar, the ID3 series exhibits greater variability and wider range of values. Day-ahead prices are negatively skewed (which is related with more frequent occurrence of bigger negative prices), the opposite can be observed for intraday prices. Finally, both electricity prices are leptokurtic, which confirms the occurrence of heavy tails caused by both positive and negative spikes. This feature has been widely discussed in the literature [3] and causes many difficulties while predicting future levels of electricity prices.

Calibration Windows
Following the majority of forecasting literature, we consider a so-called rolling window scheme. Similarly to [18,19], instead of arbitrarily choosing a fixed calibration window length, we consider a set of 673 different window lengths-ranging from 56 (ca. two months) to 728 days (ca. two years)-obtained forecasts are later averaged (see Section 4). The first 728 (=2 × 364) days are used for the initial model calibration.

Variance Stabilizing Transformation
Since electricity prices exhibit strong seasonality as well as spiky behaviour, we follow the recommendation of [39] and apply a so called variance stabilizing transformation (VST) to all datasets (to time series of prices as well as to exogenous variables). We apply the N-PIT transformation which is based on the so called probability integral transform. The transformed price X d,h for day d and hour h is given by: where P d,h is the real observation for day d and hour h,F P d,h (·) is the empirical cumulative distribution function of P d,h in the calibration sample, and N −1 is an inverse of the normal distribution function. We calibrate the models to transformed time series and then apply inverse transformation to the computed forecasts in order to obtain the price predictions:

Models
In this study, we consider both day-ahead and intraday (ID3) prices from the German electricity market. The latter ones are usually forecasted during the delivery day and modeled with the use of a broader (more recent) pool of information, compared to day-ahead price forecasting. Most of the researchers [40][41][42] focus on very short forecasting horizons (from four to three hours before the delivery). Such a modeling setup, while allowing for higher accuracy of the predictions, do not leave enough time for utilizing the forecasts and adjusting trading strategies to market conditions. To tackle this issue, we decided to extend the forecasting horizon to 6 h, as it would enable market participants to exploit the future price movements and optimize the trades [43].
Intraday prices can also be predicted in a day-ahead manner, i.e., before submitting bids in the day-ahead market. This approach is particularly important when market participants need to decide, where to sell or buy energy (day ahead vs. intraday market). In such case, they need to predict day-ahead revenues from different trading strategies, as in [1]. Such approach can also be beneficial in terms of a decision-making process and risk management.
Let us focus first on the day-ahead spot prices, DA d,h . In order to compute their point forecasts, autoregressive models with exogenous variables (ARX) estimated via the least-squares method are utilized. This type of models has been extensively used in the electricity price forecasting (EPF) literature [39,44,45]. The classical setup is expanded to include five exogenous variables: TSO forecasts of total load (L d,h ), wind (W d,h ) and solar (S d,h ) generation, as well as spot prices of carbon emission allowances and natural gas. The final model, denoted by DA, is described by the following formula: where h are the lagged day-ahead prices from previous day, two days before and a week before. DA d−1,min and DA d−1,max refer to the minimum and the maximum price from day d − 1, DA d−1,24 is the last known price from the previous day. Finally, D 1 , . . . , D 7 are weekday dummies accounting for the weekly seasonality. Note that the solar generation forecasts, S d,h , are included in the model only for hours 9-17, due to the obvious lack of generation during night and early morning hours. The second task is to predict the day-ahead intraday price, namely the value of the ID3 index for the day d and hour h. We conduct the forecasting on the day preceding the delivery, as in the DA case. This implies that the intraday and spot prices are modelled in the same manner: all 24 prices for day d are forecasted at the same time, using the same pool of information. The model, denoted by IDA (Intraday Day-Ahead), has a structure similar to (3) and extends the model proposed by [1]. It assumes that the data generating process of intraday prices could be described by the following equation: where h are the lagged intraday prices. Due to the transaction timeline (see Figure 4) the prediction are performed at 10:00, when some of the intraday prices ID3 d−1,h are yet not known. Therefore, a new variable, ID3 * d−1,h is constructed: where ID partial d−1,h is the volume-weighted average price of all transactions for a certain product, that have been made up to the moment of forecasting. In case there were no transactions, ID partial d−1,h is replaced by the corresponding day-ahead price.
Finally, we build a model for a very short-term forecasting horizon for the intraday market (6 h before the delivery), which we denote by ID. It is based on the results presented in [40,46] and assumes that the ID3 price for day d and hour h is given by: where ID3 d,h−6 refers to the ID3 price six hours before delivery and ID3 d−1,h is the price for the hour h on the previous day. The ID partial d,h is the volume-weighted average price of all transactions for a certain product, that has been made up to the 6 h before the delivery (the moment of forecasting). The next two variables link the intraday and day-ahead markets. DA d,h refers to already known day-ahead price for the day d and hour h, while DA d,h−6 gives the newest information about price level difference between whose two markets. The rest of the predictors are just like in the DA and IDA models.
Note that for a better readability, we write ID3 d,h+i to mark the product with the delivery i hours after (or before, for i < 0) the product (d, h) instead of using the correct notation

Forecast Averaging
The literature shows that the accuracy of forecasts depends on the length of the calibration window used for the estimation of the model parameters. As shown in [18,19] this relationship could be non-monotonic and hence the selection of the optimal calibration window length becomes a complex task. On the other hand, the diversity of outcomes provides a strong motivation for using forecast averaging techniques, which could improve the forecasting performance of predictive models. Moreover, combining predictions could help to solve an issue of the optimal calibration window selection and reduce the model-specification risk.
Somehow interestingly, the concept of averaging forecasts across calibration windows of different lengths is relatively new in the field of electricity price forecasting. The recent articles [18,19] were the first papers tackling this overlooked problem in a systematic way.

Weighed Averaged Windows
The simple arithmetic average of the selected predictions is one of the most popular forecasts combining approaches. This method has been proved successful in a number of different studies across the econometric and forecasting literature. In the presented setup, the averaged window (AW) averaging scheme assumes equal weights for all forecasts estimated with the calibration windows of lengths τ ∈ T .
Findings of [18,19] demonstrate that the reduction of the set of window lengths used for forecast averaging, T , could improve the method performance. The authors indicate that the average of predictions obtained with three short and three long calibration windows, in most cases, outperforms the single 'optimal' window as well as the average across all window lengths. The solution is also very efficient in terms of the computational cost-it requires calibrating the model to only six different sample lengths. [19] extended the idea of simple averaging and proposed an averaging scheme called weighed averaged windows (WAW). The weights are computed using the the inverse of the Mean Absolute Error (MAE) calculated over the averaging window of length D ave (in [19] where w where ε d,h . Using this approach, the past performance of each window is taken into consideration and bigger weights are assigned to forecast obtained from windows that performed well in the past. Despite the computational efficiency and satisfying performance of this method, the choice of calibration window lengths has to be made in an ad-hoc manner and the inappropriate choice may have a significant impact on the forecasting performance.

BMA
Bayesian analysis offers an alternative approach to classical forecast assemble methods. As stated by [47], the weights based on bayesian model averaging (BMA) can be approximated by where w (τ) d is the weight corresponding to the window of length τ and BIC(τ) is a Bayesian Information Criterion. In should be noticed that since the models are estimated with different calibration windows, BIC is computed here over the forecast averaging window, not over the estimation sample. Then where K is the number of parameters and RMSE Since the penalty component in BIC(τ) does not depend on the window length, τ, then the Equation (11) can be further simplified and becomes It can be noticed that (14) is similar to the definition of WAW weights. The differences are two-fold: first, BMA weights are based on RMSE rather than MAE and second, they are raised to power 24D ave , which represents the length of the forecast averaging window and shrinks weights of less accurate forecasts towards zero.

PCA Averaging
The majority of forecast combination approaches discussed in the literature, either use a small number of predictions or emphasise the need for prediction selection. Alternatively, one could utilize the information included in a big panel of forecasts by using the principal component analysis (PCA). The idea has been proposed by [14], who applied static factors to combine forecasts coming from different models. Similarly, [36] proposed the factor quantile regression averaging (FQRA) to construct prediction intervals using a panel of point forecasts. In both articles, PCA averaging is applied to relatively small and diversified panels of forecasts based on 27-66 individual models.
In the presented setup, the panel of forecast consists of 673 individual predictions acquired with different calibration windows. Since the growth of the window size, τ, leads to more stable parameter estimates, the forecasts obtained with long windows, for example, τ = 721 and τ = 728 are almost identical. This strong correlation, which is close to collinearity, impedes the classical, regression-based methods of forecast averaging. To avoid such problems, we propose a novel, fully-automated method for averaging forecasts, based on PCA.
In order to utilize all information from the averaging window, the data and predictions are treated as time series, with the time index, t = 24(d − 1) + h, representing consecutive hours. Similarly to the WAW approach, we use the information from D ave previous days (in our setup D ave = 182). Additionally, the data is extended by 24 forecasts of hourly prices from the day d. Therefore the final averaging window consists of 24D ave + 24 observations. Let us denote byP t,τ the prediction of the variable P t based on a calibration window of the length τ. The data set {P t,τ } could be interpreted as a panel, with the first dimension representing time and the second dimension describing the size of a calibration window. The averaging algorithm consist of the following steps:

1.
For each time period t in the averaging window, estimate the mean (μ t ) and standard deviation (σ t ) of individual forecasts across different τ; 2.
Standardize the predictions and the predicted variable; 3. Estimate the first k = 1, ..., K principal components, PC t,k , of a panel {P t,τ }, using the method described by [14,22]. Notice that the factors have a dimension (24D ave + 24) × 1 as they include the information of the price forecasts on the day d. Based on the number of principal component K used in the model we denote the model by PCA(K); 4.
Run a regression using observations from the averaging window, without the last 24 observation;

5.
Compute the prediction of the normalized dependent variable on day d at hour h.
and transform it into its original unitŝ The role of standardization should be emphasised here. The mean, which changes between days, could be interpreted as the first common factor affecting the panel of forecasts. In particular, it represents the forecast based on long calibration windows. The predictions for big τ are, by construction, very similar to each other and have the largest input to the mean. On the other hand, the impact of long windows on the demeaned panel is balanced by larger (in absolute terms) and more variable deviations from mean for short calibration windows.
The standard deviation represents the forecast uncertainty and increases when short and long windows give different predictions. If the original data was used to estimate the principal components, the days with the highest risk would have the largest input to the panel variance and hence would impact strongly factor estimates. Thanks to standardization, all days are equally represented by common factors and the outcomes are stable, even when the outliers are included in the sample used for forecast averaging. On the other hand, in the future work one can try to use the information about standard deviation and include the variance to the model for probabilistic forecasting, for example to construct the prediction intervals with quantile regression or its generalizations [48,49] It should be noticed here that the described algorithm is conditioned on several factors, K, used in the regression (17). To make the choice of K data-driven, we use Bayesian information criteria (BIC): whereσ 2 K is an estimated residuals variance from the model (17) with K components. For each day d, the optimalK is chosen, which minimizes the corresponding BIC.

Forecast Evaluation
We use the Mean Absolute Error (MAE) for the full out-of-sample test period of D = 778 days (i.e., 26.06.2017 to 15.08.2018, see Figure 1) as the main evaluation criterion. In the paper, two measures are considered where ε d,h is the prediction error at day d and hour h based on the averaging method i or i = τ for models without averaging. The first measure, MAE (i) d describes the forecast accuracy for a given day, d, and is later use for statistical comparison of individual approaches. Finally MAE (i) describes the overall performance of the method (i). Recall, that the MAE is the most commonly used measure for evaluation forecast accuracy. In the case of electricity markets, it reflects the average deviation of the revenue from selling 1 MWh from its expected level. Given a number of results, it is hard to properly rank the models accuracy. To solve this issue, following [39,44], we introduce the mean percentage deviation from the best (m.p.d.f.b.) benchmark, inspired by the m.d.f.b. measure used in [50,51] for comparing models. The m.p.d.f.b. measure for model i compares the model's performance to the best benchmark (it is the best performing calibration window length for each of models j = DA, IDA, ID: The obtained MAE values can be used to provide a ranking of models. Unfortunately, they do not allow to draw statistically significant conclusions on the outperformance of the forecasts of one model by those of another. Therefore, the conditional predictive ability (CPA) test of Giacomini and White [52] is used to compare competitive outcomes. Note that the CPA test could be viewed as a generalization of the popular Diebold and Mariano [53] test for unconditional predictive ability. Here, the test statistic is computed using the vector of average daily MAE d : where MAE (i) d is the mean absolute error of forecast obtained with model i on day d. For each pair of window sets and each model we compute the p-value of the CPA test with null H 0 : φ = 0 in the regression [52]: where X d−1 contains elements from the information set on day d − 1, i.e., a constant and ∆ X,Y,d−1 .

Point Forecast Results
As mentioned earlier, in this paper we consider training samples of lengths ranging from 56 to 728 days. Since the same model calibrated to a sample of different lengths produces differing forecasts, this gives us 673 different 'sub-models' for each of the models. The forecasting performance is evaluated separately for each calibration window, and the results are shown in Figure 5-each dot represents the MAE of forecasts obtained by calibrating the model to a sample of a certain length. Interestingly, the curves for DA and IDA models are not monotonic as one may expect, the forecasting error does not strictly fall with the increase of the calibration sample length. This behavior of MAE may suggest that the models are slightly misspecified due to, for example, assumed linearity, time-invariant parameters or omitted variables. In such a case, the parameter estimates are inconsistent and do not converge to their true values. On the other hand, the curve for the ID model is descending. The forecasting accuracy of this model increases with the length of the calibration window, leaving little room for an improvement for averaging techniques.
Another conclusion that can be drawn is that none of the calibration window lengths would be the 'optimal' choice for all models-the best-performing calibration window for DA model is 95 days, whereas for IDA the best forecasting performance would be achieved when calibrating the model on a 438-day sample and for ID the longest calibration samples perform the best. This diversified pattern of behavior shows that there is a need for the more robust way of selecting the length of calibration windows.  Table 2 presents MAE and m.p.d.f.b results for forecasts obtained with the shortest (56-days), the one-year long (364-days) and the longest (728-days) calibration windows and compare them against a benchmark: the best (optimal) (ex-post) calibration window length. Next, outcomes of different averaging technique are reported, starting with AW/WAW and BMA for all τ ∈ {56, 57, ..., 728}, AW/WAW and BMA for six selected window sizes τ ∈ {56, 84, 112, 714, 721, 728} as in [18][19][20] and PCA averaging with 1 to 4 factors. Finally, outcomes of PCA(BIC) scheme are presented, in which the number of components is estimated using BIC information criterion. The results are displayed in absolute terms (MAE) and relative to the benchmark (computed as a percentage difference, %chng).

Averaging Results
The presented measures are computed with data ranging from 26 June 2017 to 15 August 2019, it is a 778-days long out-of-sample period. The three considered models, it is DA, IDA and ID, are evaluated separately and their outcomes are shown in consecutive columns. Finally, the average performance of analyzed forecasting schemes is described by m.p.d.f.b. The results lead to several important conclusions:

•
In case of DA and IDA models, the averaged forecasts are more accurate than any of the individual predictions, including those based on the best ex-post calibration window length. The gains reach up to 3.841% and 2.097% for DA and IDA, respectively. At the same time, none of the combined predictions provide results better than the benchmark for ID model. This confirms our expectation that averaging across different calibration window lengths may not lead to any improvement of forecast accuracy in case of well-specified models, which parameters can be consistently estimated.

•
When two similar averaging schemes, AW and WAW, are compared, the results indicate that the extension of WAW, originally proposed by [19], outperforms the simple arithmetic mean. The superiority of WAW is obtained for all models and both ranges of window lengths, T .

•
The outcomes for the AW/WAW averaging scheme show that the pre-selection of six window lengths improves the forecast accuracy only in case of misspecified models: DA and IDA. At the same time, results for ID suggest that, for well-specified models, the ad hoc reduction of the τ dimension increases substantially the MAE measure.
• Results for the BMA scheme perform much worse in terms of MAE for the DA and IDA models compared to non-Bayesian approaches. The situation changes for ID model, when BMA is the most accurate among forecast averaging schemes but still worse than the best individual model.

•
The PCA forecast averaging approaches lead to more accurate predictions of DA and IDA than any other combining schemes. They reduce MAE, relatively to the benchmark, by 3.841% and 2.097%, respectively. For ID model, all averaging schemes perform worse than the 2-year calibration window. Still, PCA is exhibits the smallest forecast error among presented methods.

•
The PCA based methods perform similarly, regardless of the number of factors used for forecast averaging. One could observe small differences between markets that indicate that PCA(4) is on average the most efficient. • BIC information criterion is shown to be helpful in selecting the number of components.
Although it could not beat the best PCA specification for individual models (DA, IDA and ID), it works very well on average.
Note that in the presented setup, forecast averaging weights are estimated using predictions of D ave = 182 previous days. Although it is out of scope of this paper, we conducted a limited study to analyze, how the reduction of D ave to 60 days affects the results. It turned out that the choice of D ave has a minor impact on outcomes and does not alter the major conclusions. It seems that the relative accuracy of PCA increases for longer forecast averaging windows, as more information improves estimation of principle components.
Finally, the results are evaluated with the Giacomini-White test [52] for the norm of order one. The outcomes are presented in Figure 6, on which a non-black square indicates that the forecasts of a model on the X-axis are statistically more accurate than the forecasts of a model on the Y-axis. The results confirm previous findings and show that PCA schemes outperform other methods, when the day-ahead forecasts are considered. This outcome is supported by two observations: • PCA(4) and PCA(1) are significantly the most accurate for DA and IDA models, respectively. • PCA (1) is not statistically worse than any of the predictions apart from some other PCA specifications.
When the ID model is considered, the outcomes show that approaches based on the longest and the best calibration window lengths provide forecasts of the same accuracy, which outperform almost all other prediction methods. Moreover, for this market,

•
Forecasts obtained with PCA (3)  Finally, it could be noticed that PCA(BIC) is rarely outperformed by other PCA specifications. This result confirms that BIC is useful in determining the optimal number of components used for averaging and hence could be an attractive alternative to the ad hoc choice of K.  Figure 6. Results of the conditional predictive ability (CPA) test [52] for forecasts of all considered models. We use a heat map to indicate the range of the p-values-the closer they are to zero (→ dark green) the more significant is the difference between the forecasts of a model on the X-axis (better) and the forecasts of a model on the Y-axis (worse).

Conclusions
In this paper, we model and predict hourly electricity prices on the German market. We consider three forecasting setups: a day-ahead forecast of spot prices, a day-ahead forecast of intraday prices and a short-term, 6 h ahead prediction of ID3 index. The analyzed problems reflect the decision process of market participants and could help in optimizing the selling/buying strategy as in [1,2].
We propose a novel approach for calculating the predictions of electricity prices, which utilize forecasts based on models calibrated to windows of different lengths. We extend the idea introduced in [18,19], which focuses on an ad hoc selection of the best set of calibration windows. In this study, we propose a principal component analysis (PCA) method for forecast averaging, which enables the automatic aggregation of information included in the large panel of predictions. The results indicate that the PCA averaging scheme can, on average, reduce the MAE measure of forecast accuracy, relative to the best ex-post calibration window length. It also outperforms other forecast averaging approaches, such as AW, WAW and BMA.
Furthermore, we show that DA, IDA and ID models have different characteristics, which correlate with the forecast horizon. The performance of day-ahead forecasts of spot and intraday prices does not improve with the growth of the calibration window length, whereas the short-term predictions of ID3 get more accurate for the longest estimation windows. This difference impacts the potential gains from forecast averaging. For the ID model, none of the proposed methods could outperform the forecasts based on the longest calibration window. At the same time, the averaging-and in particular PCA forecast combination-results in a significant decrease of MAE for DA and IDA models. The forecast accuracy improves relative to the benchmark, by almost 4% for PCA(4) scheme and DA model. In the case of IDA model, the error reduction reaches 2% for PCA (1) approach.
Finally, the results indicate that the two PCA forecast averaging methods, with one or four components, provide the most accurate predictions. PCA(1) is statistically not worse than any of the AW or WAW forecast combination schemes and outperforms all approaches for IDA. At the same time, PC(4) has on average the lowest MAE, measured by m.p.d.f.b (mean percentage deviation from the best). The PCA method, with the number of components selected with BIC information criterion, provides forecasts which are only slightly worse than those obtained with PCA(4) and PCA (1). Hence, it could be viewed as an interesting alternative for ad hoc selection of the number of components. We believe that these results encourage further research on PCA forecast averaging, which could be extended to interval and probabilistic forecasting and be applied to other commodity markets.