Predictive densities for day-ahead electricity prices using time-adaptive quantile regression

: A large part of the decision-making problems actors of the power system are facing on a daily basis requires scenarios for day-ahead electricity market prices. These scenarios are most likely to be generated based on marginal predictive densities for such prices, then enhanced with a temporal dependence structure. A semi-parametric methodology for generating such densities is presented: it includes: (i) a time-adaptive quantile regression model for the 5%–95% quantiles; and (ii) a description of the distribution tails with exponential distributions. The forecasting skill of the proposed model is compared to that of four benchmark approaches and the well-known the generalist autoregressive conditional heteroskedasticity (GARCH) model over a three-year evaluation period. While all benchmarks are outperformed in terms of forecasting skill overall, the superiority of the semi-parametric model over the GARCH model lies in the former’s ability to generate reliable quantile estimates


Introduction
Probabilistic forecasts of the day-ahead wholesale price in the form of predictive densities are required to solve many of the challenges faced by participants in today's electricity markets.For instance, optimal bidding strategies for wind power producers, like the ones described in [1,2], require such forecasts for scenario generation.Similarly, scenarios for day-ahead prices are used in the pricing of hydropower in [3,4] and for the optimal bidding of other types of generation units in [5].This aside, foreseeable developments in the electricity sector, such as continuing growth of renewable energy technologies and the emergence of flexible demand, are likely to further boost the demand for such forecasts.
The aim of this paper is to present a methodology for the density forecasting of day-ahead electricity prices that permits the generation of operational scenarios.First and foremost, such a model should produce reliable density forecasts of the untransformed prices.For the purpose of scenario generation, it is important that the forecasts properly describe the full density of the prices (instead of a set of quantiles only, as is sometimes the case for applications in risk management concerned with a single variable).Once a proper description of the density is obtained, scenarios respecting the prices' correlation structure are easily obtainable using a scenario generation framework similar to that presented in [6].
The unique features of electricity as a commodity explain the distinct characteristics of its price.This is mostly since supply and demand must be matched constantly and instantaneously.In addition, since electricity cannot be stored directly in an efficient and cost-effective manner, production must take place synchronous with consumption.Together, this makes arbitrage over time and space close to impossible [7,8].Electricity consumption, however, is highly inelastic in the short-term and exhibits multiple strong seasonal patterns [9].The supply function, on the other hand, is discontinuous, convex and steeply increasing in the high production end [10,11].Altogether, this results in electricity prices that in most markets exhibit most or all of the following features: strong multiple periodicities, intra-and inter-day correlation, non-stationarity, positive skewness, high kurtosis, mean reverting spikes and general excessive volatility [11][12][13].
Density forecasting of day-ahead electricity prices has received increased attention over recent years.The most popular approach in the existing literature on the matter is to describe conditional price densities by some variant of the generalist autoregressive conditional heteroskedasticity model (GARCH), uni-or multi-variate ( [14][15][16] and the references therein), often with the aid of a jump-diffusion model [17,18].Disappointingly, though, rigorous measures of the forecasting skill for such conditional density models are, in most cases, not reported.Alternative approaches are quite rare in the literature.As a first example, [11] presented a novel approach for Bayesian density forecasting based on a vector autoregressive model with skew-t noise, while assessing the overall fit of the predictive densities obtained.All inference is, however, performed on logarithmic transformed data, and no scheme for conversion to the original scale is discussed.This is while such a scheme is paramount for a proper transformation of the forecast for a non-linear and non-stationary process, like electricity prices.It is therefore hard to pinpoint how their model would perform on the real scale and compared to other types of models.Besides, [19,20] present interesting methods for modeling the first four moments of conditional price densities with alternative probability density functions.More recently, novel nonparametric approaches were proposed based on quantile regression averaging, therefore allowing one to issue probabilistic forecasts in a forecast combination framework, e.g., [21,22].
In the broadest sense, there are two different methodological lines to model conditional densities: adopting parametric assumptions and non/semi-parametric modeling.The parametric approaches have the appeal that the full density of the modeled process is characterized by very few parameters.This, in turn, makes a single model sufficient for obtaining a description of the whole density.This very same feature is, however, also the main drawback of parametric approaches, since it constrains density shape.In contrast, non-or semi-parametric approaches do not suffer from this shape inflexibility, since assumptions about the distributional shape are either none or conditioned to a certain domain.These approaches, however, involve either severely increased model complexity, or a substantial growth in the number of models to be estimated, or even both.
Using data from the Western Danish price area for Nord Pool's Elspot, models for conditional price densities from both categories will be derived, analyzed and compared in the following.As a parametric approach and as a form of baseline approach, modeling the standard deviation of a Gaussian distribution is attempted through a series of formulations of the well-known ARCH/GARCH models [23,24] and similar regression models.Alternatively, a non-parametric quantile regression (QR) [25][26][27] model is derived for quantiles ranging from 0.05 to 0.95.In order to overcome the shortcomings of QR regarding the modeling of distribution tails, it is coupled with a model for the tails based on an exponential assumption.All models have time-varying parameters and describe the conditional densities of electricity prices given an already existing point forecast, here generated with the model described in [28].Generally, any well-tuned model for point forecasting of electricity prices could be applied.These point forecasts enter the models as an explanatory variable, potentially supplemented by load forecasts.Therefore, a particularity of our approach is that it builds on point forecasts that may already be routinely provided to forecast users.These forecasts are to be "dressed" with conditional distributions of potential forecast errors, then allowing one to inform of forecast uncertainty in a methodologically sound and pragmatic manner.
In addition to comparing the models with each other, their performance is compared to that of four benchmark approaches.These are the (i) empirical marginal density of electricity prices, often referred to as climatology forecasts in meteorology; (ii) a Gaussian unconditional density; (iii) a conditional Gaussian density whose mean and standard deviation are estimated by exponential smoothing; and (iv) a kernel density estimation model.The continuous ranked probability score (CRPS) [29,30] and the related continuous ranked probability skill score (CRPSS) are used for a comparison of forecasting skill.In terms of these measures, both models are shown to outperform the benchmarks.It is also demonstrated based on reliability diagrams that even though the CRPSS of these two models types is quite compatible, it is only the QR-exponential model that produces reliable density forecasts.
The remainder of this paper is structured as follows: Section 2 describes the functions of Nord Pool and the data, then developing into an empirical analysis in order to highlight the main data features to be modeled subsequently.The models and related parameter estimation are described in Section 3, while further analysis and genuine comparison of their forecasting skill are the focus of Section 4. Finally, concluding remarks are given in Section 5.

Nord Pool's Elspot
Nord Pool's Elspot is a day-ahead market for the physical delivery of electricity, which covers 5 different countries.By default, the day-ahead price should be the same in the entire market region unless prevented by transmission bottlenecks.Gate-closure is every day at noon for exchange during the upcoming day.The prices are set with hourly resolution as the intersection between the aggregated supply and demand curves.At first, curves for the whole region are constructed, and their intersection constitutes the system price-the price at which trades are settled if transmission capacity is sufficient.If, however, the resulting production and consumption schedules prompt congestion in the transmission network, two or more area prices are calculated.The area prices are found in the same manner as the system price.The supply and demand curves, however, only comprise the bids within the area where transmission capacity is sufficient along with full utilization of congested lines.Further information can be found in [31].
The empirical work of this paper is based on data from the Western Danish price area (DK-1).The data covers a period of almost exactly three years from 21 December 2008, through December 2011.The data set comprises hourly forecast and observed day-ahead prices for the area, forecast system load and the area's forecast wind power production.The point forecasts for the prices are found by the model described in [28], while the load forecasts were obtained from Nord Pool's website [32].Finally, the wind power forecasts originate from a statistically-based wind power prediction software (see [33]).

Data Analysis
In order to demonstrate the previously described characteristics of the prices and to establish the necessary properties of a model for their density, a brief empirical data analysis is presented.Let π (S) t denote the day-ahead electricity price for hour t.The top plot in Figure 1 shows a time series plot of π (S) t for the whole data period.From the plot, some of the features mentioned in the previous section are apparent.The prices' mean and volatility varies constantly throughout the series, and price spikes are quite frequent.From the bottom two plots in Figure 1, which show time series plots of π (S) t for March, 2009, and January, 2010, the daily and weekly seasonal cycles of the prices are apparent, as well as some of the other features previously mentioned.
Figure 2 shows the residual series: where π (S) t is the forecast price generated by the model in [28], issued before 11:00 on the day before delivery.The horizontal lines on plot represent the α/2 and the 1 − α/2 empirical quantiles of the residuals.
The variation of the mean and volatility is far less severe for ε t series than for π (S) t .Furthermore, the magnitude of the spikes in the residual series is severely reduced compared to the unfiltered prices.The reduction of these features makes ε t better suited for parametric modeling than π (S) t and will yield more stable parameters over time for any type of model.
Another appeal of modeling the density of ε t is that it does not exhibit nearly as much diurnal variation as that of π (S) t .Furthermore, the dependence of the mean on exogenous variables vanishes.This can be seen from the plots in Figures 3 and 4.
Figure 3 shows the mean (µ), standard deviation (σ), skewness (γ) and Pearson's kurtosis (κ) of ε t and π (S) t conditional to different explanatory variables.Three variables are considered here and a fourth one derived from the others.These are the price forecast π (S) t , the forecast system load L t , the forecast wind power production W t and the forecast wind power penetration Wt Lt .In order to ease the comparison between ε t and π (S) t , the empirical mean of π (S) t has been subtracted from the corresponding conditional means.The plots in Figure 3 reveal that the impact of the explanatory variables on the mean becomes negligible for ε t , while it is quite strong for π (S) t .Apart from the lower standard deviation and somewhat elevated kurtosis, the relationship patterns between ε t and π (S) t appear consistent for higher order moments.An explanation for the higher kurtosis of ε t is to be found in excessive price spikes, which, due to the lowered σ, yield an increase of κ.The strongest relation seems to be with the expected price, and since the other variables have already been taken into account in the derivation of π (S) t , it seems reasonable to start out with a density model depending on expected price (i.e., given by point forecasts) and subsequently add other variables and examine whether the predictive skill is improved or not.In addition to the analysis based on moments presented here, dependencies with other potential explanatory variables were also verified visually.The probabilistic calibration assessment presented later on will permit one to verify the suitability of distributional assumptions.
Figure 4 shows hourly values of µ, σ, γ and κ for the two series.A comparison of the two lines in each plot reveals that by subtracting π (S) t , much of the diurnal variation in residual densities is mitigated.The daily seasonality that remains is consistent with the relationship between π (S) t and the density of ε t .Hence, a model relying on π (S) t to explain conditional densities of π (S) t through ε t will not require an explicit daily seasonality term, while the direct modeling of densities of π (S) t would.Overall, in a parametric probabilistic forecasting framework, the observed behavior of γ and κ should favor distributions with higher flexibility than Gaussian ones, e.g., generalized Gaussian and Student's t distributions.Here, instead of dealing with more complex probability distributions, our proposal below is to concentrate on a semi-parametric approach that gives more flexibility in describing conditional distributions of forecast errors.For simplicity, our parametric benchmark is based on a Gaussian assumption.It should be upgraded if one was truly interested in benchmarking parametric vs. nonparametric approaches to probabilistic electricity price forecasting.

Benchmarks
Four relatively simple benchmark models are constructed, which the more elaborate models will be evaluated against.These models include: I The empirical unconditional distribution of electricity prices (climatology, defined by a set of quantiles); II A time-invariant Gaussian density, i.e., a density defined by the empirical mean and standard deviation of electricity prices; III A simple exponential smoothing for the mean and standard deviation of the prices, yielding time-varying Gaussian densities; IV A kernel density model estimated using all past data, the last week worth of data, the last 500 and the last 1000 observations available; For Model I, the predictions are obviously constant.The sample quantile Q π (S) (τ ) with nominal level τ is found as: where t .In other words, Q π (S) (τ ) is the value for which: 1 where N is the number of observations in the data set.
The prediction intervals obtained from Model II are also constant.They are taken from a Gaussian distribution with standard deviation: σ π (S) : and with mean: Model III relies on a two-stage exponential smoothing of the mean (µ π (S) ,t ) and the standard deviation (σ π (S) ,t ) of π (S) t , i.e.: In the above, the optimal value for α is found as: where µ π (S) ,(d−1),t and σ 2 π (S) ,(d−1),t are, for a given time t, the forecasts based on exponential smoothing issued the previous day at noon.These forecasts are readily given by the last available smoothing estimates µ π (S) ,t and σ 2 π (S) ,t at that time.They are therefore necessarily a function of the smoothing parameter α, as described in Equation (6).
Finally, Model IV is a kernel density estimation of the probability density function (pdf) of ε t : where K(•) is a Gaussian kernel: N est is the number of observations used for the estimation at each time and h is the bandwidth, chosen according to the method described in [34].Four lengths of estimation periods are tried, all previous data available, data for the last week (N est = 168) and the last 500 and 1000 available observations.Afterwards, the cdf is obtained by numerical integration of the estimated pdf.Using N est = 168 observations for estimation at each time yielded the best performance and was used as representative benchmark in the following.Just like for Model III, the density estimated just before noon on the day before realization is used to produce the forecast for time t.

Gaussian Models
The parametric assumption formulated here is that the residuals from the electricity price point forecasts can be seen as Gaussian.The relationship between spot prices and the point forecast from [28] can be written as: where z t is a sequence of independent and identically distributed (i.i.d.) standard Gaussian variables and σ ε,t is the standard deviation of ε t from Equation ( 1).An autoregressive model is used in the second step for the expected price.Thus, assuming a non-zero E[ε t ] and, consequently, adopting an AR-GARCH model for ε t is superfluous.This eventually yields: All of the models for Gaussian density forecasts derived in this section involve modeling of ε 2 t , which is plotted in the top panel of Figure 5. Relative to the scale of the plot, ε 2 t seems to have a level close to zero, though with frequent spikes.Autocorrelation and partial autocorrelation functions (ACF and PACF, respectively) for ε 2 t are shown in the bottom panel of Figure 5.The ACF and the PACF suggest that a low order ARCH or GARCH model is sufficient.A GARCH model of order (q, p) (see, e.g., [24]), describing σ ε,t , is formulated as: where: For p = 0, the model in Equation ( 12) becomes an ARCH(q) model, and M exogenous variables can be introduced in the model by adding an extra term, such that: where x t = [x 1,t , . . ., x M,t ] T is a vector of length M , including the regressor values at time t.
Based the preceding data analysis and the ACF and PACF for ε 2 t in Figure 5, 10 different model formulations are chosen for analysis.Experiments with the models revealed that only the predicted price and load contributed to improving forecasting skill.Thus, the formulations are given here explicitly in terms of these variables: The parameters, α, β, γ and θ, are estimated recursively (as described in [28] and [35]) with the forgetting factor λ chosen as: The regression parameters, α, β and γ, are estimated separately for each lead-time.This translates to estimating hourly-specific parameters.A prediction formula can be written generally for models (I)-(X) as: The alternative approach of estimating a single set of parameters for one-step ahead prediction and to generate prediction for further lead times iteratively was found to perform worse than this direct prediction approach.

Time Adaptive Quantile Regression
The QR model is based on the time adaptive QR-framework described in [27].The QR model is used to describe the conditional densities of ε t for nominal levels between 5% and the 95%.Afterwards, this model is supplemented with a description of the distributions' tails.The readers interested in topics ranging from fundamental properties to the latest developments in the field of quantile regression are referred to [25].
The most basic form of a QR model [26] for ε t , given a nominal level τ , is: where F −1 εt,t (τ ) is the inverse cdf of ε t , x t is a vector of explanatory variables and e t is a noise term, centered at zero and with finite variance.Finally, β(τ ) is a vector of model parameters, estimated by solving a linear program: The non-linear relationship between ε t and the explanatory variables is accounted for by representing x t by B-splines.Furthermore, the estimation dataset is updated as new observations become available, while the oldest observations are discarded.Then, a new estimate of the regression parameters is obtained by solving Equation (18).Estimating the parameters on relatively recent values only has the drawback that deviations from the average behavior are quickly discarded.In order to ensure a broader coverage of the explanatory variables' previous values in the estimation set, its updating is done on a number of partitions or bins.Each bin contains observations falling within a pre-specified range, and every time a new observation becomes available, it replaces the oldest observation in the corresponding bin.Meanwhile, other bins containing data for different values of the variable remain intact.Afterwards, the data bins are combined into a single estimation set, which then contains a broader range of observations than it otherwise would.The updating procedure is described more thoroughly in [27].
In light of the analysis presented earlier on, we start out with a QR-model of the form: where b i are natural cubic B-spline basis functions with no intercept term.Furthermore, K is the number of knots used for the spline fitting, and e t is an error term, centered and with finite variance.The remaining candidates for explanatory variables, L t , W t and Wt Lt , are subsequently added to the model one by one.Both a spline representation is considered, as well as their inclusion as a linear term.The model that led to the highest forecasting skill is of the form: That is, in addition to the spline representation of the predicted price, the predicted load is also included in the model as a linear term.The variables linked to wind power generation proved not to enhance the model's forecasting skill nor did representing L t by splines.Since the optimal memory of the model is likely to vary for different values of τ , a best model setup is found for each individual value of τ .This calls for a skill score that can rate a single quantile forecast and not the entire cdf.Following the suggestions of ( [36] and the references therein), such a score is defined from Equation (18) as: and is used to evaluate a model's forecasting skill for a given nominal quantile τ and, subsequently, to decide on the meta parameters of the model in Equation (20).
Pareto and Weibull, could also be considered if deemed more appropriate.The probabilistic calibration results shown in a further part of the paper will actually show that, for our test case and dataset, this exponential assumption is acceptable.For an exponentially distributed variable x, the pdf f (x; λ) and the cdf F (x; λ) are defined as: where λ is the rate-parameter defined as one over the mean of x.The estimate of λ at time t, λ t , is obtained by maximum likelihood estimation using all data available until time t.It is easily shown that such an estimate of the rate-parameter is found as: where N t is the number of observations available at time t.

Evaluation of Forecasting Skill
For a comparison of the forecast quality, the average continuous ranked probability score (CRPS) [29] and the related continuous ranked probability skill score (CRPSS) are used.For a single observation, x o , and a corresponding predicted cdf, F (x), the CRPS is defined as: where: is the well-known Heaviside function.Subsequently, CRPS is found as: The CRPSS for a certain model is defined as: where CRPS model is the average CRPS for the model and CRPS ref is that for a chosen reference model.The reference model is commonly chosen as the empirical quantiles, i.e., benchmark Model II, which is also the case here.In order to allow for a fair comparison of the performance of all of the derived models, two sets of CRPS values are reported.One is estimated using a set of quantiles, τ 1 , ranging from zero to one in steps of 0.05.For the other one, τ 2 , the set of quantiles used includes additional quantiles in the tails of the distribution.The sets τ 1 and τ 2 are defined as: τ 1 ∈ {0, 0.05, 0.10, . . ., 0.95, 1} τ 2 ∈ {τ 1 , 0.005, 0.01, 0.025, 0.975, 0.99, 0.995} In both cases, the empirical minimum and maximum for the whole data set are used for the zero and one quantiles, respectively.

Estimation Results
For all models, parameter estimation is done using data from 21 December 2008, until 31 December 2009.The remaining two years are used as an independent test set.These data sets will be referred to as the training set and the test set, respectively.Part of the training data (the last 20%) is used for cross-validation, i.e., to avoid overfitting of the models that would affect their generalization ability when being used to predict based on new data over the evaluation set.Scores are then reported in the following for the training and test data, separately.Implemented in R and run on a Lenovo X230 laptop with an Intel Core i5-3320M (2.60 GHz) processor, estimation of all quantiles models and tails based on this one year of data takes less than one hour.Then, issuing predictions is extremely cheap, since one is only feeding the models with new data and not having to perform any optimization.
Table 1 lists CRPS values for the best performing models along with the corresponding parameter estimates for the benchmark and the parametric models.The model parameters for the QR-E model are collated in Table 2.It is to be noted that, to be completely rigorous, one would associate statistical significance test results with the various verification metrics, in order to asses whether the differences in score values may be seen as significant or not.This was not done here, since, in our experience, when using such long evaluation sets (two years), such differences in CRPS values may be deemed of practical magnitude.Among the benchmarks, the kernel density estimation model outperforms the other ones.This was the case for all values of N est tried, although N est = 168 was the one for which the difference was largest.The exponential smoothing model was also found to outperform the time-invariant benchmarks, although it performs poorer than the kernel model.
The best performing parametric models are the GARCH-X model with π (S) t as the explanatory variable (III) and the GARCH-X and ARCH-X models with both π (S) t and L t as exogenous variables (V and VI, respectively).Although the ARCH-X model outperforms the other two, the difference in CRPS is small.Furthermore, the forgetting factor λ is suspiciously low for the ARCH-X model, as it can not rely on the moving average term to accommodate the short-term dynamics of σ εt .Thus, the forgetting factor of the two GARCH models is of a more desirable magnitude in terms of parameter stability.The τ -specific bin sizes, N B , for the QR-E model are listed in Table 2 along with the placement of the spline knots and the bin edges.According to expectations, the largest amount of data is used to estimate quantiles close to the tails.Then, on both sides of the median, there are quantiles for which the model has a relatively short memory, indicating more rapidly varying dynamics.Closer to the median, memory increases again, since point forecasts have already accommodated most dynamic changes there.All in all, the merits of the model's flexibility, compared to the parametric models, is illustrated by the longer memory of the QR-E model, thus allowing for more stable regression parameters.
The rate parameters for z t and z (u) t are, like for the other models, updated daily using all available data at the estimation time.The corresponding CRPS values are listed in Table 3, along with that of climatology forecasts for the tails.Table 3 shows that a slightly better description of the lower tail is obtained with the exponential model, if compared to climatology.Disappointingly, however, the opposite is the case for the upper tail.It should be remembered that this climatology benchmark does not yield genuine forecasts, since it is based on all available data on electricity prices.Indeed, if issuing a forecast at a given time t, it relies on information for future times t + k, k > 0. In contrast, the forecasts issued based on the exponential model are genuine and only rely on information up to time t.

Model Comparison
Bar-plots of the CRPSSs for the test period are presented in Figure 6, using τ ∈ τ 1 and τ ∈ τ 2 as earlier defined.Figure 6.Bar-plot of the continuous ranked probability skill score (CRPSS) for the models listed in Table 1 for the test period, for nominal levels τ ∈ τ 1 (a) and τ ∈ τ 2 (b).
Overall, the more elaborate models seem to outperform the benchmarks by a significant margin.The difference between the Gaussian and the QR-E models may appear negligible, in comparison.
For any type of risk management or scenario generation, the reliability of the density forecasts is essential.Most risk management models, such as value-at-risk (VaR) or conditional value-at-risk (CVaR), aim at minimizing losses at or beyond a certain quantile of the loss probability distribution.Thus, an unreliable density estimate will severely affect all risk assessment.Similarly for scenario generation that often is used for generating inputs to models, such as VaR and CVaR, an unreliable density forecasts will yield scenarios that do not cover the full spectrum of possible outcomes.Moreover, since uncertainty generally tends to be underestimated in density forecasts ( [37]), unreliable density forecast will generally prompt risk to be underestimated.For this reason, the reliability of the probabilistic forecasts generated by the various models is examined.
A thorough discussion of the concept of reliability is given in [38,39], so only a brief introduction will be given here.Ideally, a series of N forecasts (with N large) for the quantile with a nominal level τ of the density of ε t , Q εt,t (τ ), should satisfy: The reliability level is usually assessed in terms of a probabilistic bias, i.e., τ − τ .For a finite data set, a deviation of τ from the nominal level τ is natural, due to sampling effects.Both [38,39] present estimation procedures for the range of likely deviations.These intervals are commonly referred to as consistency intervals, and correspondingly, they are illustrated as consistency bars on reliability diagrams.
In [39], it was suggested that the autocorrelation of the probability integral transform (PIT) [40] should be accounted for in the estimation of consistency intervals for reliability diagrams.Using the predictive densities obtained by the models described here, the PIT of ε t is found as: and should be uniformly distributed between zero and one.The PITs for the four models are plotted in Figure 7.The plots reveal that even though the PITs seems to be close to uniform on a time-invariant plot, like a histogram, for the whole period, this is not the case in the short-term, as the PIT looks to have a drifting mean for all models.Figure 8 shows the rank ACFs for the QR-E model.The non-negligible autocorrelation in the PIT needs to be accounted for when the of the forecasts is evaluated.The ACFs for the other models look similar.Therefore, the methodology presented in [39] for constructing consistency intervals for the reliability diagrams is adopted.Reliability diagrams, plotted on probability paper (see [38]), are shown for the four models in Figure 9.
For the QR-E model, the reliability hypothesis can not be rejected at the 10% significance level for all quantiles.For the ARCH-X model, however, the hypothesis is rejected for nine out of the 25 quantiles at the 5% significance level.Furthermore, going to the 1% significance level only reduces this number to eight.For the two GARCH models, the results are even worse.Thus, one must conclude that even though the two models score similarly in terms of CRPSS, the Gaussian models fail to produce reliable probabilistic forecasts.The reliability diagrams are a clear testimony of the shortcomings of the Gaussian assumption.The assumed shape of the density causes center quantiles to be generally overestimated, while the tail quantiles are underestimated.This indicates that ε t has, in fact, a sharper density with higher kurtosis than a standard normal distribution, even conditionally.If aiming to produce parametric probabilistic forecasts of day-ahead prices that would be better calibrated, better assumptions on the shape of predictive densities should be formulated.This was not done here, since we are aiming to further analyze the quality and properties of the semi-parametric approach based on the QR-E model.

Properties of the QR-E Model
Based on the previous results, we aim here at going into a further level of detail in analyzing the characteristics of the forecasts issued based on the QR-E model, e.g., based on various price levels or on the time of the day.
Firstly, Table 4 lists the CRPSS for the QR-E forecasts and the CRPS for both that model and the climatology one within each estimation bin for π (S) t of the training and the test set.That is, the quantile forecasts have been segmented by the first and the third sample quartiles and the CRPSS estimated for each bin.As one would expect from any advanced forecasting approach, the QR-E model consistently outperforms the climatology benchmark regardless of the expected price (CRPSS > 0).Besides, the superiority of the QR-E model is greater in situations where the prices are expected to be high or low.The fact that the difference in CRPS between the intermediate bin and the bins outside the inter-quartile range is at least 70% larger for the benchmark model than for the QR-E model is a testimony of this.These results are not surprising, since a static reference model is generally expected to give the best result under ordinary circumstances at the cost of less frequent events.Thus, greater improvements should be expected for π (S) t above and below average by using a more elaborate model.The varying performance of the QR-E model in the outer bins is partly due to their varying update frequency between the two periods.During the training period, the average spot price is lower than during the test period.The segmentation, on the other hand, is done according to the empirical quantiles of the whole data set.Thus, the bin containing the high expected prices is updated less frequently than the low price bin during the training period.On the contrary, the low price bin represents expected prices further from the mean during the test period than during the training period.This prompts the question of whether time-varying bin edges would be appropriate.No effort for obtaining such a model was made, though.
Figure 10 depicts the CRPSS and the CRPS for each hour of the day, which in this case, is also for each lead-time.Again, the instability of the CRPSS seems to be mainly driven by the varying skill of the climatology quantiles.In the context of [28], it is interesting to see the peak (poorest performance) of the climatology model in the early evening hours.These were observed in the forecast quality of the model presented in [28], and since the CRPS is closely related to the mean absolute error (MAE) for point forecasts, the appearance of these is natural for the climatology model.Interestingly, however, this peak is by far less severe in the curves representing the QR-E model.
It was shown in [28] that this poor performance during these hours was due to more volatile prices.Thus, it is pleasant to see that the combination of the model from [28] and the QR-E model is able to capture this increased volatility to some extent.
Two examples of the density forecasts from the QR-E model are shown in Figure 11.They illustrate well how the density of π (S) t varies with the expected price.

Concluding Remarks and Discussion
A model able to generate reliable density forecasts for the day-ahead electricity prices at Nord Pool's Elspot has been presented.Time-adaptive quantile regression is used to describe the density between the 5% and the 95% quantiles.The distribution tails are then approximated with an exponential function.The proposed methodology is shown to consistently outperform four different benchmark approaches for every hour of the day and regardless of the level of the prices.Compared to the parametric Gaussian models also presented in this paper, the QR-E model has the advantage that it produces reliable quantile forecasts.
The density model is built as an independent extension of the point forecasts presented in [28], which simplifies the modeling process considerably.The expected price could, however, be obtained by any other well-tuned model serving the same purpose.Unlike the majority of procedures in the existing literature, the modeling is done on untransformed data, allowing for directly deriving density forecasts on the actual scale of the prices.Furthermore, as the results presented in this paper are derived by mimicking practical data availability, the model can be applied as presented for online density forecasting of the prices.
The type of forecasts here presented are valuable for various applications.For one, the model's predictions can be used directly for determining the optimal structure of block bids ( [31], see for the definition) in markets where such bids are accepted.Secondly, the model can be used for generating the scenarios necessary to solve the decision problems described in [1][2][3][4][5].In the scenario generation process, the auto-correlation of the PIT needs to be accounted for, e.g., by applying the scenario generation framework from [6].The sequential clearing of many liberalized electricity markets, e.g., the Scandinavian and the Spanish ones, makes density forecasts of the day-ahead prices vital for any generation of market scenarios, even though the day-ahead prices themselves are not explicitly required in the scenarios.This is since other market variables are either strongly influenced or even directly dependent upon the day-ahead prices.Accounting for the whole range of potential outcomes on the day-ahead market is vital for obtaining a set of scenarios spanning the whole density of possible realizations on other markets.
Our probabilistic forecasting methodology could be improved in many different ways.First of all, from Figure 7, a relatively slow drift in the realized PIT can be detected.For a time-varying process like electricity prices, such effects will never be fully prevented; although, it would be interesting to see whether an auto-regressive or a moving average term could mitigate this behavior of the PIT.This could either be done in the quantile autoregression framework discussed in Koenker [25] (Section 8.3, pp.260-265) or as an exponential smoothing model like the one described in Taylor [41] (and the references therein).Secondly, an alternative definition of the bin edges, e.g., in terms of sample quantiles over a given period, could be interesting to pursue.Thirdly, even though the parametric models yielded unreliable forecasts, there are several extensions to the models that might successfully mitigate that problem.For instance, some of a mixture model might help in this regard.The mixture model could be either a discrete one, e.g., a Hidden Markov model, or a continuous latent model could be applied.Another interesting possibility would be a double stochastic (G)ARCH-X model [35].This would serve the purpose of accommodating the shifts between periods of excessive volatility and periods of relative tranquility better than the exponential smoothing does.Furthermore, other parametric distributions could be considered (e.g., generalized Gaussian and Student's t).For such models, one should bear in mind that some form of adaptivity in the parameter estimation would be of core importance.
Finally, given a larger data set, revisiting the exponential tail models would be worthwhile.For one, even though the center part of the density of ε t does not exhibit any apparent diurnal variation, it seems plausible that the tail quantiles would do so due to the higher demand during the day.For the data set at hand, however, hourly estimation dilutes the data set so severely that the estimation of a seasonal model for the tail would be meaningless.A larger data set would also allow for investigation of whether a limited memory is appropriate for estimation of the tail.

Figure 7 .
Figure 7.The probability integral transform (PIT) of ε t for the ARCH-X model (a) and the QR-E model (b).

Figure 8 .
Figure 8.The rank ACF (a) and PACF (b) for the PIT of the forecasts from the QR-E model.

Figure 9 .
Figure 9. Reliability diagrams on probability paper for the GARCH-X models (a,b), the ARCH-X model (c) and the QR-E model (d).

Figure 10 .
Figure 10.Hourly CRPSS for the QR-E model (a) and the hourly CRPS values for the QR-E model and the climatology one (b).

Table 2 .
The best bin sizes (N B ) found for each value of τ .

Table 3 .
The CRPS for the tail models, estimated for τ ∈ τ 1 .

Table 4 .
The CRPS for the predictions from the QR-E model and the climatology quantiles, segmented according to π