Forecasting Volatility and Tail Risk in Electricity Markets

: This paper investigates the beneﬁts of jointly using several realized measures in predicting daily price volatility, Value-at-Risk, and Expected Shortfall in the Australian electricity markets of New South Wales, Queensland, and Victoria. We propose using Realized GARCH-type models with multiple measurement equations based on robust estimators to account for market microstructure noise and jumps in electricity price series. The model speciﬁcations that combine information from multiple realized measures improve the in-sample ﬁt of the data. The out-of-sample analysis shows that use of the jump-robust medRV estimator signiﬁcantly increases the accuracy of volatility forecasts, while in forecasting Value-at-Risk and Expected Shortfall at different risk levels, the standard GARCH(1,1) also performs remarkably well.


Introduction
The analysis of volatility of electricity spot prices is crucial for traders, portfolio managers, policy makers, and other market participants. The growing interest in modeling the dynamics of electricity prices has revealed several distinctive features typically not observed in financial assets due to the nonstorability of electricity. As electricity is not storable, and because of the inelasticity of supply and demand, electricity prices are known to be much more volatile than other commodity prices. In particular, the main stylized facts show that daily and intradaily electricity spot prices are usually characterized by seasonality, mean reversion, high volatility persistence, frequent price jumps and spikes of short duration, inverse leverage effects (electricity price volatility reacts more to positive shocks than to negative shocks), stationarity in both the price level and squared prices, and negative prices that are mainly related to the inability to dispose of electricity freely together with nontrivial start-up costs for generators (Bierbrauer et al. 2007;Byström 2005;Chan et al. 2008;Escribano et al. 2011;Fanone et al. 2013;Frömmel et al. 2014;Higgs and Worthington 2008;Knittel and Roberts 2005).
The popular GARCH-type framework is widely used to model and forecast the volatility of electricity prices (e.g., Bowden and Payne (2008); Escribano et al. (2011); Garcia et al. (2005); Hickey et al. (2012); Knittel and Roberts (2005); Liu and Shi (2013), among others). Standard GARCH models usually rely on daily data to estimate the latent conditional variance, using all current and past daily squared returns to provide expectations on future volatility. However, the daily return offers only a "weak" signal on the current level of volatility; therefore, this class of model is not able to capture rapid changes in the volatility level.
Since electricity cannot be physically stored directly, production and consumption need to be continuously balanced to smooth supply and demand shocks (Bierbrauer et al. 2007). In this direction, the liquidity of the electricity market has grown rapidly. The increased availability of high-frequency information has led to the development of new econometric methods for modeling and forecasting volatility, revealing that high-frequency data are much more informative about the price process not only at the intraday level, but also at the daily level. The daily Realized Volatility (RV), defined as the summation of the squared intradaily price changes, provides an unbiased and highly efficient estimator of return volatility (Andersen et al. 2001(Andersen et al. , 2003Barndorff-Nielsen and Shephard 2002).
Volatility modeling using intradaily price frequencies has received considerable attention, not only in the financial market but also in the electricity market. In this framework, most of the existing literature relies on HAR-type models to directly estimate time series of realized measures. Chan et al. (2008) used the HAR model of Corsi (2009) and the HAR-CJ model of Andersen et al. (2007) to estimate volatility and identify jumps in the electricity price process on five regions of the Australian market. However, their results show a modest improvement in volatility forecasts when total variation is separated into continuous and jump components, with no strong evidence that the HAR-type specifications outperform the EGARCH model. Haugom et al. (2011) added exogenous effects on the HAR model and the HAR-CJ model to assess the day-ahead predictions in the Nord Pool forward market, finding forecast improvements from the inclusion of exogenous effects. Haugom and Ullrich (2012) extended the HAR approach, including forward realized volatility as a predictor to improve spot price volatility forecasts for some electricity markets of the United States. Differently, focusing on the conditional variance of returns, Frömmel et al. (2014) referred to the Realized GARCH models (Hansen and Huang 2016;Hansen et al. 2012) with a single measurement equation to forecast the volatility on the Electricity Power Exchange market using both the RV and the intraday range as realized measures. The empirical results suggest that the RGARCH specifications outperform the EGARCH model in terms of forecasting accuracy, especially when intraday range is used, stating (among other potential determinants) that range-based measurements are more robust to microstructure noise bias.
The aim of this paper is to assess the benefits of jointly using different realized measures in fitting and forecasting electricity price volatility. Differently to Frömmel et al. (2014), we resort to the use of Realized GARCH models to combine information through multiple measurement equations and multiple realized estimators. In particular, to deal with market microstructure frictions and extreme jumps in electricity price series, in addition to the RV, we also refer to the robust estimators Realized Kernel (RK) (Barndorff-Nielsen et al. 2008) and medRV (MRV) (Andersen et al. 2012). Moreover, since accurate volatility modeling is crucial for risk management (Byström 2005;Chan and Gray 2006), we also focus on the ability of models to accurately predict Value-at-Risk (VaR) and Expected Shortfall (ES) at different risk levels.
Our empirical analysis on time series of spot prices sampled at 30-min intervals in the regional Australian power markets of New South Wales, Queensland, and Victoria reveals some interesting findings. First, the Realized Exponential GARCH (REGARCH) specifications, combining multiple realized measures, improve the in-sample fit over the standard GARCH(1,1) and the simple Realized GARCH (RGARCH) models. Second, the use of jump-robust MRV as a realized measure in the RGARCH model leads to a significant improvement in volatility forecasting by minimizing the QLIKE (Patton 2011), Mean Squared Error, and Mean Absolute Error loss functions. Finally, in evaluating the performances in forecasting VaR and ES at the risk levels of 1%, 2.5%, and 5% via the Quantile Loss function of (González- Rivera et al. 2004) and the class of strictly consistent loss functions , respectively, the GARCH(1,1) and some REGARCH specifications clearly outperform the simple RGARCH based on a single measurement equation. The Model Confidence Set (MCS) of Hansen et al. (2011) is used to assess the significance of differences in the predictive performances of the models under analysis.
The remainder of the paper is structured as follows: Section 2 reviews the Realized GARCH (Hansen et al. 2012) and the Realized Exponential GARCH (Hansen and Huang 2016). Section 3 presents the estimation procedure. Section 4 describes the data.
Sections 5 and 6 show the in-sample and the out-of-sample results, respectively. Finally, Section 7 concludes with a summary of the results and some directions for future research.

Model Specifications
The GARCH model is by far the most widely used specification for fitting and forecasting financial volatility. Let r t be the daily log-return at time t and I t−1 the information available up to time t − 1; then, the GARCH(1,1) model takes the following form: where h t = var(r t |I t−1 ) and z t iid ∼ (0, 1). The positivity condition for the conditional variance requires γ and β to be non-negative constants and ω to be a (strictly) positive constant, while a necessary condition for the weak stationarity of GARCH(1,1) is γ + β < 1.
Many studies have documented that realized volatility measures based on intradaily returns can greatly improve the accuracy of volatility forecasts. Differently from standard GARCH approach, the Realized GARCH (RGARCH) of Hansen et al. (2012) employs both low (daily returns) and high (realized volatility measures) frequency information to model the dynamics of daily volatility. Following the log-linear specification, the RGARCH is given by where x t is a realized volatility measure and τ(z t ) = τ 1 z t + τ 2 (z 2 t − 1) is the leverage function, with z t iid ∼ (0, 1) and u t iid ∼ (0, σ 2 u ) being mutually independent. Therefore, the RGARCH provides a joint modeling framework of the return and realized volatility, replacing squared returns with a more informative volatility estimator to capture the conditional variance dynamics. The model is completed by the measurement equation in (5), which allows us to define the link between the (ex-post) realized measure and the (ex-ante) latent conditional variance. In addition, replacing the measurement equation into the GARCH equation, the model implies an AR(1) representation for the log-conditional variance, namely, where w t = τ(z t ) + u t and E(w t ) = 0, with the restriction (β + ϕγ) < 1 to ensure the stationarity of the process (Hansen et al. 2012;Li et al. 2019). Hansen and Huang (2016) proposed the Realized Exponential GARCH (REGARCH) model while allowing the inclusion of multiple realized measures of volatility and also including an explicit leverage term in the GARCH equation, providing further flexibility in modeling the dependence between returns and volatility.

Estimation and Inference
The GARCH and R(E)GARCH models are estimated using a maximum likelihood (ML) approach. In particular, we assume a standardized Student-t distribution for innovations z t iid ∼ t(0, 1, ν) to model both fat-tail and excess kurtosis observed in return series.
For REGARCH specifications, due to the presence of K measurement equations, following Hansen and Huang (2016), we assume that u t iid ∼ N K (0, Σ), where N K (0, Σ) denotes a K-variate Normal distribution with mean 0 and variance-covariance matrix Σ. Therefore, the log-likelihood function for this class of models is given by The log-likelihood accounts for the contribution of realized measures by (11) and the contribution of returns by (10).
On the other hand, as the RGARCH is characterized by a single realized measure, Equation (11) with u t iid ∼ N (0, σ 2 u ). Finally, as standard GARCH models do not allow for measurement equations, estimation of model parameters is performed by focusing only on the partial log-likelihood of returns in Equation (10).
It is worth noting that the GARCH and R(E)GARCH specifications are not directly comparable in terms of the maximized global log-likelihood. However, since the contribution to the log-likelihood value of returns is the same for both classes of models, the partial log-likelihood of the returns in (10) enables us to compare the empirical fit of the conventional GARCH with that of R(E)GARCH-type models.

The Data
The dataset used in this study consists of half-hourly spot prices from the Australian electricity market. In Australia, the Australian Energy Market Operator (AEMO) manages the National Energy Market (NEM), which interconnects Queensland (QLD), New South Wales (NSW), the Australian Capital Territory (ACT), South Australia (SA), Victoria (VIC), and Tasmania (TAS), as well as the Wholesale Electricity Market (WEM) in Western Australia. In particular, the empirical analysis focuses on 30-min spot prices of the NSW, QLD, and VIC for the period between 1 January 2012 and 31 December 2019, where the prices are in Australian dollars per megawatt hour. Continuously recorded half-hourly spot prices are publicly available at https://www.aemo.com.au (accessed on 12 May 2021).
The summary statistics reported in Table 1 highlight some stylized facts of price and return dynamics. Differently to financial assets, electricity prices can be negative when the supply of electricity temporarily exceeds demand, resulting in a minimum half-hourly intradaily price lower than zero for the three observed markets. A further characteristic of the electricity prices is that the maximum price is more than 200 times larger than the average price, implying high variability and a high degree of skewness and kurtosis. These features can easily be seen in Figure 1, which shows the time series of 30-min spot prices in the considered Australian electricity markets. Each market exhibits negative prices and several spikes with a magnitude much higher than their average price. This emphasizes a further characteristic of electricity prices: the existence of extreme jumps.
As negative prices are a rare occurrence, to compute log-returns, we excluded days with nonpositive prices (Haugom and Ullrich 2012;Qu et al. 2018). Further, to account for intraday seasonal patterns in the raw data, following Haugom and Ullrich (2012), returns are demeaned using half-hourly median returns µ * t,j =r mn,dy,hr , wherer mn,dy,hr denotes the median return for day t in month mn, on day of the week dy, and in half-hour periods j = hr. Therefore, in our empirical study, we used demeaned intraday returns r * t,j = r t,j − µ * t,j for computing realized volatility measures. On the other hand, we refer to daily close-to-close log-returns as daily price changes. Although the mean and median are close to zero, electricity returns are highly volatile, as evident from the standard deviation, which varies between 0.216 and 0.314. In addition, the negative skewness and excess kurtosis clearly reveal the non-Gaussian nature of the r t distribution.
Finally, descriptive statistics point out that the 30-min RV is, as expected, characterized by positive skewness and a strong excess kurtosis because of the many peaks and troughs in the series. This can easily be seen in Figure 2, which shows the RV behavior for the three electricity markets, where each plot has the same y-axis scale to facilitate comparison across markets. The QLD market reveals several periods of high volatility and the most extreme levels of RV compared to NSW and VIC, which instead show more moderate and less recurrent spikes.
As price volatility is characterized by different dynamics under different market conditions, in addition to the Realized Volatility (RV) (Andersen et al. 2003), to account for market microstructure noise and jumps in our empirical analysis, we also referred to the robust estimators Realized Kernel (RK) (Barndorff-Nielsen et al. 2008) and medRV (MRV) (Andersen et al. 2012), computed at a frequency of 30 min.

In-Sample Analysis
In this section, we present the in-sample results for NSW in Table 2, QLD in Table 3, and VIC in Table 4. For each electricity market, we report the in-sample fit of the GARCH(1,1); three RGARCH and REGARCH models based on the single realized estimators RV, RK, and MRV; three REGARCH models with two measurement equations for the pairs (RV,RK), (RV,MRV), and (RK,MRV); and the REGARCH(RV,RK,MRV) combining all three realized estimators considered. Note that to simplify the presentation of the results, the RGARCH models are estimated using the autoregressive representation in order to make the estimated coefficients comparable with those of the REGARCH models.       Table 3. QLD in-sample parameter estimates.         The empirical results show that the β estimates of the GARCH(1,1) for NSW, QLD, and VIC are 0.863, 0.669, and 0.874, respectively, confirming the fact that QLD is the mostnervous electricity market. This aspect is also highlighted by the estimate of γ 1 taking the highest value for QLD. Focusing on the volatility persistence, for the GARCH(1,1), we observe (β + γ 1 ) ≈ 1; whereas, for the R(E)GARCH-type models, β (the parameter which summarizes the persistence of volatility for this class of models) is, on average, about 0.96. Thus, volatility decay is faster for models including high-frequency information. Moreover, a larger γ 1 for individual realized measures in R(E)GARCHs with respect to the standard GARCH (1,1) is a sign that intraday information provides a stronger signal on future volatility than squared returns.

GARCH(1,1) RG(RV) RG(RK) RG(MRV) REG(RV) REG(RK) REG(MRV) REG(RV,RK) REG(RV,MRV) REG(RK,MRV) REG(RV,RK,MRV
Analyzing the estimated parameters of the measurement equations, it turns out that although the realized measures x t are an upward biased measure of the conditional variance (ξ > 0), they are approximately proportional to h t , as suggested by estimates of ϕ close to 1.
Furthermore, differently to what is usually observed for financial stock returns showing τ 1 < 0 and τ 2 > 0 as well as δ 1 < 0 and δ 2 > 0, here, we find that all parameters of the leverage functions are positive with τ 1 > τ 2 and δ 1 > δ 2 . This implies an inverse leverage effect, as positive shocks in electricity prices lead to a larger increase in volatility than negative shocks, confirming the finding in Knittel and Roberts (2005) Overall, the estimated measurement error variance confirms that the QLD electricity market is the most volatile, showing the highest values of σ 2 u . Additionally, the parameter estimates for σ 2 u are approximately the same for RV and RK, but are larger for MRV, highlighting that a frequency of 30 min is not sufficient to cancel out the effects of price jumps. In addition, focusing on the individual realized measures, an inverse relationship emerges between the coefficient γ and the variance σ 2 u of the residual measurement error. This is because γ reflects the "amount" of information about volatility variation; so, the larger the coefficient, the more accurate the realized measure.
Regarding the ρ parameters, as expected, their estimates are very close to 1, indicating that RV, RK, and MVR are highly positively correlated, implying that some γ are negative or not significant for REGARCH models using two or three realized measures.
The estimated ν parameter ranges between 2.85 and 5.37, with the upper bound always provided by the GARCH (1,1), confirming the existence of leptokurtosis in the conditional distribution of returns.
Finally, as it makes no sense to compare the full log-likelihood for different realized measures, we only report the partial log-likelihood of the return component (r). Not surprisingly, there is a clear improvement from RGARCH to REGARCH specifications, which provide the highest maximized values. The lowest values for (r) always occur in the RGARCH(MRV), while, overall, the standard GARCH(1,1) performs quite well.
The out-of-sample forecasting performance of the models is evaluated by considering different loss functions. First, the ability to accurately forecast volatility is assessed by the QLIKE (Patton 2011), Mean Squared Error (MSE), and Mean Absolute Error (MAE): whereĥ t is the 1-step-ahead conditional variance forecast and x t is the volatility proxy at time t. Since the magnitude of bias due to microstructure noise and jumps tend to vanish at low frequencies, to consider different market scenarios in our forecasting study, we refer to the RV computed at the frequencies of 30 min, 2 h, and 6 h as volatility proxies. Next, we evaluate the out-of-sample forecasting ability of the models considering one-step-ahead VaR and ES forecasts generated for three different risk levels: 1%, 2.5%, and 5%. The adequacy of VaR forecasts is assessed through the Quantile Loss (QL) function (González-Rivera et al. 2004;Koenker 2005) where l t = I (r t <VaR t (α)) . The QL is a strictly consistent scoring rule for VaR prediction. Further, being an asymmetric loss function, it is particularly suited to assess quantile risk measures as it imposes a higher penalty, with weight (1 − α), for observations below the α-quantile level, namely, when we observe returns exceeding the VaR. Regarding the ES, as there is no loss function for which the ES is the solution for minimizing the expected loss, it lacks the mathematical property called elicitability (see e.g., ; Gneiting (2011);Weber (2006) among others).
However, as ES turns out to be jointly elicitable with VaR, we rely on the class of (strictly) consistent scoring function  to evaluate the ability of the proposed models to jointly forecast VaR and ES with v t and e t , the VaR and ES, respectively, while G 1 (·) is weakly increasing, G 2 (·) is strictly increasing and strictly positive, and G 2 (·) = G 2 (·).
Although several strictly consistent scoring rules for the pair (VaR, ES) can be obtained as special cases of the family of functions in (14), following Patton et al. (2019), we assume VaR and ES to be strictly negative and ES t (α) ≤ VaR t (α) < 0, with G 1 (x) = 0 and G 2 (x) = −1/x, resulting in the zero-degree homogeneous loss function Finally, the Model Confidence Set (MCS) of Hansen et al. (2011) is used to assess the significance of differences in the predictive performances of the models under analysis considering the confidence levels of 75% and 90%. In particular, in the MCS implementation, we have considered a Range statistic and 5,000 bootstrap resamples generated by means of a block-bootstrap procedure, where the optimal block length has been estimated using the method described in Patton et al. (2009). Table 5 shows the out-of-sample model comparison based on average losses for QLIKE, MSE, and MAE using different volatility proxies. Values in boldface indicate the models that return the minimum average loss, while those shaded in gray and light-gray are associated with models that are included in the 75% and 90% MCS, respectively. The results in Table 5 clearly indicate that the RGARCH relying on the jump-robust estimator medRV provides more accurate volatility forecasts than other models, always minimizing the loss functions considered for all proxies and each electricity market. Furthermore, it is the only model entering MCS at any confidence level. No other models enter MCS, with the exception of the RGARCH(RV) and RGARCH(RK) using the 30-min RV and the RGARCH(RK) using the 2-hour RV, which are included in the 90% MCS for the MSE loss function and the VIC electricity market. On the other hand, the GARCH(1,1) appears to be the worst competitor as it produces the highest loss in every possible scenario. The largest discrepancies between the GARCH(1,1) and R(E)GARCH models occur for the QLIKE loss function, confirming that the QLIKE is more powerful in rejecting poorly performing predictors (Liu et al. 2015;Patton 2011). Overall, the simple RGARCH structure leads to substantial improvements in the accuracy of volatility forecasts over the REGARCH models based on one or more realized measures.
The scenario is completely reversed when forecasting VaR and ES. The results in Table 6 point out that REGARCH specifications provide lower average loss values for QL and FZ 0 than RGARCH models, which are always excluded from the MCS at any risk level. At the same time, the GARCH(1,1) model also turns out to be a good competitor, minimizing the loss functions in most cases. In particular, it is the only model that always enters the MCS at the most extreme 1% risk level both for VaR and ES, and for NSW and QLD, no other model belongs to the MCS. Although the REGARCH(RV,RK,MRV) model minimizes QL and FZ 0 for different risk scenarios, models based on a single or combination of two variables almost always enter the MCS, especially when RV and RK are considered to explain volatility dynamics. Furthermore, the MCS shows that the differences between the models are more pronounced in forecasting ES. Finally, moving towards less extreme risk levels, such as 5%, there is less discrimination between models.

Conclusions
This paper uses half-hourly spot prices from the Australian electricity markets of New South Wales, Queensland, and Victoria to predict volatility and manage risk in energy markets. In this framework, we extend the literature on modeling the conditional variance of returns using the Realized GARCH approach by combining information from multiple realized robust and nonrobust measures to capture the key features of electricity prices such as extreme jumps and the inverse leverage effect. Our empirical analysis underlines the following points. First, specifications with multiple realized measures outperform those based on a single realized measure as well as the standard GARCH(1,1), resulting in a remarkably better fit of the data in-sample. Second, the medRV jump-robust measure significantly increases the accuracy of out-of-sample volatility forecasts. In particular, the simple Realized GARCH based on a single measurement equation for the jump-robust medRV estimator always minimizes the set of loss functions considered-i.e., QLIKE, MSE, and MAE-and is the only model that enters the MCS under all circumstances addressed. Finally, in contrast to volatility forecasting, when assessing the predictive ability of the models in terms of VaR and ES, it emerges that the standard GARCH is highly competitive especially for the extreme risk level of 1%. Similarly, REGARCH models based on one or more realized measures outperform the simple RGARCH, which shows-in this case-the worst results in minimizing the loss functions at any risk level. Further, the MCS highlights greater discrimination between models in predicting ES. Electricity market participants aim to constantly pursue optimal trading limits in order to adequately allocate capital and to cover potential losses if trading limits are violated. This is also because overcapitalization implies idle capital that could undermine the profitability of energy industries; at the same time, undercapitalization could cause financial difficulties when they are unable to honor trading contracts. Therefore, accurately predicting VaR and ES is essential for effective energy risk management, as they are the most commonly used measures for establishing optimal trading limits. One aspect that has not been considered here, and that is worth examining in future research, is how the inclusion of exogenous factors, such as weather conditions, would affect electricity price volatility. In addition, volatility inter-relationships between various regions and energy markets are interesting future research areas, as is extending the results to energy markets other than Australia. Finally, as this study has mainly focused on modeling one-day-ahead volatility, exploiting the properties of the Realized GARCH-type models, a natural extension would be to predict price volatility at a longer horizon.