Abstract
Electricity price forecasting has been a topic of significant interest since the deregulation of electricity markets worldwide. The New Zealand electricity market is run primarily on renewable fuels, and so weather metrics have a significant impact on electricity price and volatility. In this paper, we employ a mixed-frequency vector autoregression (MF-VAR) framework where we propose a VAR specification to the reverse unrestricted mixed-data sampling (RU-MIDAS) model, called RU-MIDAS-VAR, to provide point forecasts of half-hourly electricity prices using several weather variables and electricity demand. A key focus of this study is the use of variational Bayes as an estimation technique and its comparison with other well-known Bayesian estimation methods. We separate forecasts for peak and off-peak periods in a day since we are primarily concerned with forecasts for peak periods. Our forecasts, which include peak and off-peak data, show that weather variables and demand as regressors can replicate some key characteristics of electricity prices. We also find the MF-VAR and RU-MIDAS-VAR models achieve similar forecast results. Using the LASSO, adaptive LASSO, and random subspace regression as dimension-reduction and variable selection methods helps to improve forecasts where random subspace methods perform well for large parameter sets while the LASSO significantly improves our forecasting results in all scenarios.
Keywords:
half-hourly electricity price; mixed-frequency; vector autoregression; LASSO; variational Bayes JEL Classification:
C11; C15; C22; C32; C58; G17
1. Introduction
Electricity in New Zealand is generated primarily through renewable sources, such as hydro, wind, and geothermal energy. Furthermore, New Zealand is committed to its path to a 100% renewable electricity market by 2030, with large wind-farm productions already in place and plans to build more, as well as a ban on offshore oil and gas exploration. As a result of these fundamental changes and the highly competitive nature of the market, electricity price volatility in New Zealand is likely to increase. In this paper, we build electricity spot price models for half-hourly prices in the New Zealand electricity market. We focus our study on half-hourly prices since it reflects the complex price formation process during intra-day trading. We want to analyze the behaviour of this price formation process with respect to current and past information about several weather variables. Furthermore, since half-hourly prices are the building blocks for lower-frequency prices, an ideal half-hourly price model can be implemented to characterize daily prices as well.
There have been several studies relating to the impact of weather variables on electricity markets worldwide. Huurman et al. (2012) analyze the predictive power of temperature, rainfall, and wind power in the Scandinavian electricity market. Their study is limited to forecasting day-ahead prices only. In our study, we observe the forecasting capabilities up to nine months ahead to capture multiple seasonal changes, with cross-validation to assess the robustness of our models. The explanatory variables we consider are electricity load (demand), temperature, rainfall, solar radiation, air pressure, humidity, wind speed, and wind direction. Suomalainen et al. (2015) analyzes the correlation of wind and hydro resources with electricity price and demand in New Zealand, with a focus on analyzing optimal wind power generation in different parts of the country. See Son and Kim (2017) and Mosquera-López et al. (2017) for more applications of weather variables in electricity markets. Weron (2014) provides an excellent review of electricity price forecasting literature.
While our price and demand observations have half-hourly frequency, our weather variables are observed every hour. As a result of this frequency mismatch, we employ a mixed-frequency VAR (MF-VAR) framework. Our aim in this paper is to study the short and mid-term forecasting capability of the MF-VAR with incorporated weather data. For this reason, we produce out-of-sample forecasts for a nine-month period. We also implement a 5-fold cross-validation scheme that sequentially produces one-week ahead forecasts. Each fold in the cross-validation scheme is separated into one year of training data, and one week of test data to evaluate the forecasts. The training set dates are strictly before the test set dates to preserve the sequential nature of our time-series problem. This study is primarily focused on the accuracy of point forecasts rather than density forecasts. Note that it is primarily because this study is conducted for market participants to assess the impact of future weather and demand conditions on electricity prices. Furthermore, points forecasts can aid participants in the electricity derivatives markets to assess the payouts of various exotic derivatives such as electricity futures caps or barriers, whose payoffs are determined by how far beyond a certain threshold electricity prices are exceeded. In these cases, it is more valuable to obtain point forecasts with external variables rather than density forecasts, since the latter can underestimate this payoff. However, it is straightforward to extend our work for density forecasts, since we largely employ a Bayesian scheme. We understand the importance of detailed modeling regarding the heavy-tailed nature of electricity prices, particularly for half-hourly prices, but we leave this for future research. Furthermore, using only weather data, we attempt to replicate the key characteristics of electricity prices organically, i.e., persistent volatility, mean-reversion, seasonality, and spikes. In particular, we are interested in replicating the heavy-tailed distribution of half-hourly electricity prices using the weather variables. In this way, we aim to provide a comprehensive study of the impact of weather on electricity price formation in New Zealand.
Mixed-frequency models are commonly used by econometricians, stemming from the mixed-data sampling (MIDAS) model by Ghysels et al. (2004). Ghysels (2016) were the first to use the MF-VAR framework to forecast monthly data using quarterly variables in a macroeconomic setting. For other applications of the MF-VAR framework, see Eraker et al. (2014), Schorfheide and Song (2015) and Götz and Hauzenberger (2021).
Generally, applications of this model involve forecasting quarterly data using monthly variables. Using lower frequency variables to predict higher frequency data is known as a reverse mixed-frequency approach. Foroni et al. (2023) introduce the reverse unrestricted MIDAS (RU-MIDAS) as an alternative to the MF-VAR. They use macroeconomic variables to estimate models for electricity traded in the German and Italian electricity markets. In this paper, we compare the forecasting capabilities of the MF-VAR and RU-MIDAS models using our weather variables. To our knowledge, we are the first to employ a reverse mixed-frequency framework for forecasting half-hourly electricity prices using hourly weather variables.
For parameter estimation of large datasets and parameter sets, the Gibbs sampler computational times are inefficient. As a result, we employ the mean-field variational Bayes (MFVB) technique, see Bishop and Nasrabadi (2006), Ormerod and Wand (2010), Wand et al. (2011) and Koop and Korobilis (2018). The MFVB approximates the joint posterior distribution of parameters and is comparable in computational time to a least-squares approach. In our study, the MFVB provides similar estimates to the Gibbs sampler. In this context, the VB is highly beneficial as an alternative to the Gibbs sampler due to its faster computational speed. We found that for iterations, Gibbs sampling took nearly 45 s to reach the convergence threshold of , whereas the VB method converged in nearly 30 s. It is important to note that the VB is scalable in contexts with larger datasets, and the difference in computational times is even more significant. Another benefit of employing the VB is that it produces a deterministic convergence, unlike other Bayesian methods. This means the convergence of VB is predictable and stable, whereas Gibbs sampling can have issues such as local convergence and may be sensitive to initial specifications. See Gefang et al. (2020, 2023) for MF-VAR estimation using the VB technique. Note that many machine learning methods including VB aim to produce the point forecasts. Hence, comparisons with other methods such as the least absolute shrinkage and selection operator (LASSO) and the random subspace, which essentially are semi-parametric and also designed for point forecasts, are a reasonable attempt and a contribution to our study. Note also that comparison with other machine learning techniques has already been attempted, see Kapoor and Wichitaksorn (2023) as an example.
The incorporation of several weather variables may result in multicollinearity within the model framework. To alleviate this issue, we employ the LASSO. Introduced by Tibshirani (1996), LASSO is a variable selection method that introduces bias into the estimates to increase accuracy. This method introduces a regularization parameter into the least-squares estimation procedure, which eliminates less-significant regressors as increases. Uniejewski and Weron (2018) employ the LASSO to identify significant lags of electricity prices in the Nord Pool and PJM electricity markets. Uniejewski et al. (2019) conduct a similar study in the German EPEX market. See Ludwig et al. (2015), Ziel (2016), Marcjasz et al. (2020) and Jedrzejewski et al. (2021) for more applications of the LASSO in electricity markets. For applications of LASSO for VAR models, see Hsu et al. (2008), Gefang (2014), Cavalcante et al. (2017) and Messner and Pinson (2019). Furthermore, we also incorporate the adaptive LASSO of Zou (2006) in our study. The adaptive LASSO is an oracle estimator that can reduce the bias in LASSO estimates and may improve forecasts. We also employ the random subspace methods introduced by Boot and Nibbering (2019), which select a subset of predictors at random using a specified probability distribution. This method reduces the complexity of the model, however, it does not eliminate multicollinearity. Other dimensionality reduction techniques such as factor analysis and principal component analysis (PCA) have been very popular in the literature, particularly in the machine learning context. However, with the success of LASSO and adaptive LASSO in our findings, a potential future study can include the comparison of LASSO dimensionality reduction with factor analysis and PCA.
The contributions of this paper are as follows. Firstly, this paper is the first to apply an MF-VAR approach to modeling and mid-term forecasting of half-hourly electricity prices. While mixed-frequency models are generally applied to larger time frames such as daily and monthly observations, we apply the MF-VAR framework to a shorter time frame of half-hourly and hourly data. Regarding the MF-VAR parameter estimation methodology using MFVB, we show that the MFVB is capable of producing close approximations for the joint posterior distribution of parameters for electricity price datasets. Secondly, we provide a detailed comparison of the MF-VAR and RU-MIDAS models. We also show that the RU-MIDAS can be specified in a VAR framework under certain assumptions, which results in the RU-MIDAS-VAR model. Lastly, this paper is the first to employ the LASSO technique for the RU-MIDAS framework. Our results show that utilizing the LASSO and adaptive LASSO techniques significantly improves forecasts for both models.
Our findings suggest that including weather variables and their time lags are beneficial for predicting electricity prices, as they are capable of replicating empirical price volatility, mean-reversion, spikes, and seasonality characteristics. We were able to achieve better forecasts using lags of 24 rather than lags of 240, suggesting that incorporating more past information may not be necessary. With that said, the LASSO method allows us to identify key explanatory variables in a vast parameter set, and significantly improves forecasting capabilities, as compared to standard estimation methods. The implementation of LASSO, in a way, allows us to use any lag value we prefer, as the optimal LASSO results will generally remove redundant parameters. We can see this in the forecasting results since the forecast metrics for LASSO with 24 lags and 240 lags have similar results. Regarding the comparison between the MF-VAR and RU-MIDAS-VAR models, our findings suggest that there is little difference in forecasting accuracy between the models, despite the RU-MIDAS-VAR using more up-to-date data for prediction. Furthermore, the LASSO provides similar forecasts for both models, suggesting that the reduced parameter space is similar. Finally, we find the MFVB parameter estimates to be similar to the Gibbs sampler estimates. The forecast results for these two methods are nearly identical, with the VB performing better in some scenarios. The application of VB to large electricity datasets is extremely beneficial due to the reduced computational burden. In particular, while Gibbs sampling with 100,000 iterations took approximately 45 min, the VB ELBO converged in nearly 30 s, with a convergence threshold of . This was a significant improvement in computational time, with little difference in the forecast results.
2. Models and Methods
In this section, we present the mixed-frequency vector auto-regression (MF-VAR) framework. We discuss several estimation methods including Gibbs sampling and variational Bayes (VB). We also discuss the RU-MIDAS model of Foroni et al. (2023) and make certain assumptions in their model so that it can be specified in a VAR framework. In this setting, the RU-MIDAS-VAR is very similar to the MF-VAR model, with a minor distinction. Finally, we introduce variable-selection techniques including the LASSO and random subspace methods.
2.1. Mixed-Frequency VAR
Since its introduction by Sims (1980), vector autoregression models have become an essential tool in macroeconomic and financial analysis. A standard VAR(p) model has the form
where is an vector of dependent variables, is an vector of constants, for is a matrix of coefficients, and .
Since our price and demand data are observed half-hourly, and weather variables are observed hourly, we employ a mixed-frequency approach. To build a mixed-frequency framework, we use the approach of Ghysels (2016) and separate our price dataset, denoted into two time-series, where the first time-series consists of all half-hourly prices from the first half-hour of each hour, denoted , and the second time-series contains all second half-hourly prices for each hour, denoted . We adopt a similar approach for half-hourly demand, denoted and . In our framework, , , and are column vectors. Finally, we denote as a row vector of the hourly observations for our external variables at time t, i.e.,
As a result, the vector in (1) has the form . Since contains 11 dependent variables (2 price variables, 2 demand variables, and 7 weather variables), we have . For estimation and forecasting, we consider time lags of and which correspond to lags of up to 1 day and 10 days, respectively.
2.2. Parameter Estimation
It is convenient for parameter estimation to use a concise notation for the VAR
where is a matrix of our data-set, is a matrix of parameters, is a matrix of error terms, and T is the total number of observations for each variable, in our case 17,280. The matrix has the form
such that is a matrix of observations.
From (2), we can estimate using the multivariate least squares technique, which has the analytical expression
and the covariance matrix is then estimated as
2.2.1. Gibbs Sampler for VAR
Due to the popularity of the VAR framework, the application of Gibbs sampling for VAR estimation has been discussed in several works. It can be shown that the posterior distribution of in (2) conditional on is the multivariate normal distribution. Mathematically, , where
where M and V are the mean and variance of the multivariate normal distribution, respectively, and are prior hyperparameters, and is the sample mean.
The conjugate prior for the covariance matrix is the inverse Wishart distribution i.e., , where
where S is a positive-definite scale matrix, is the degrees of freedom parameter, and and are prior hyperparameters.
A brief explanation of the Gibbs sampling algorithm is as follows.
- Step 1:
- Set initial values for the prior hyperparameters and . The initial value for can be chosen as the least squares estimate.
- Step 2:
- Draw VAR coefficients in matrix from the multivariate normal distribution and using (3).
- Step 3:
- Use the result from Step 2 to draw values for the covariance matrix from the inverse Wishart distribution and using (4).
Repeat Steps 2 and 3 for an arbitrarily large number of simulations to ensure convergence.
In this paper, we adopt a hierarchical shrinkage prior in the form of the Horseshoe prior, see Zou (2006) and Cross et al. (2020). There is much discussion on the usage of informative priors for Bayesian estimation. While it is difficult to identify a single best prior selection, several studies suggest that the usage of flat priors may lead to inaccurate estimates, which further leads to inadequate predictions, see Litterman (1986) and Bańbura et al. (2010). For further discussion of informative priors, see Koop and Korobilis (2010).
2.2.2. Mean Field Variational Bayes for VAR
Variational Bayesian (VB) inference methods have seen a rise in popularity in recent years as an estimation technique when MCMC methods are computationally challenging. In this paper, we provide a brief explanation of the VB technique. For a more comprehensive, theoretical understanding, refer to Ormerod and Wand (2010) and Blei et al. (2017).
Consider the context where we would like to obtain the posterior distribution , where denotes the parameter set and y denotes the data. The VB methods approximates with another, simpler density . The new density is found by minimizing the Kullback-Leibler divergence between the two densities, or by maximizing the Evidence Lower Bound (ELBO)
where indicates the expected value of the densities.
For mean field variational Bayes (MFVB), the choice of an approximating density is restricted such that
where for are blocks of the complete parameter set .
If the parameter blocks are assumed to be independent, the ELBO is written as
The optimal density is then determined as
where denotes all parameter blocks apart from .
We follow the approach of Gefang et al. (2023) for VB inference in a VAR framework. Referring back to (1), we modify it to take the form
such that is a matrix of explanatory variables and is a vector of coefficients, where ⊗ is the Kronecker product and . Having the VAR in the form of (5) is useful because the model can then be expressed as k independent equations with the ith equation having the form
where the ith equation corresponds to the evolution of the ith dependent variable on the left-hand side of the original framework in (1). From here onward, the notation and are used. This specification of the VAR allows equation-by-equation estimation, which can greatly reduce computational times, particularly for larger data sets. In this context, the equation-by-equation VAR specification is essentially an ARX representation where each dependent variables is estimated as a separate ARX model. Furthermore, this specification eases the assumptions of the VAR, i.e., the dependent variables are assumed to be independent of each other. By breaking this assumption, we can separate the VAR structure into individual equations with uncorrelated errors . This assumption may not always hold, but it eases our computational burden for Bayesian estimation.
The priors for the parameters in the ith equation are
where , , and are prior hyperparameters.
Using the derivation of full conditional posteriors by You et al. (2014), the MFVB approximation densities for the ith equation are
where , , and are parameters of the density functions dependent on the observations and hyperparameters such that
In this study, we employ a hierarchical shrinkage prior in the form of a Horseshoe prior, see Carvalho et al. (2010) and Cross et al. (2020). The horseshoe prior adds the assumption that
where the priors for the parameters are
where i denotes the VAR equation, j denotes the coefficients in that equation, and is the total number of coefficients. The full derivation of the conditional posteriors can be found in Gefang et al. (2023).
The iterative procedure begins by specifying the prior hyperparameters and providing initial values for and . The is calculated after each iteration, and the procedure runs until the change in meets the specified convergence criterion. Finally, we use the specified by Gefang et al. (2023)
which is derived from the assumption that .
For the prior hyperparameters, we set . These values are chosen using a grid-search, although the parameter estimates were not sensitive to the choice of these hyperparameters. We chose values for and according to least squares estimates. Our stopping criteria for the change in is set to a tolerance of . The iteration procedure generally converged relatively fast, with total computational time comparable to a least-squares approach. Furthermore, the parameter estimates were similar to Gibbs sampler estimates, which signifies a good approximation.
2.3. RU-MIDAS Model
The RU-MIDAS model proposed by Foroni et al. (2023) is an extension of the popular MIDAS model by Ghysels et al. (2004), which allows the use of low frequency variables to predict high frequency data. The RU-MIDAS in single-equation structure for half-hourly and hourly frequencies, and disregarding any external variables, has the form
where denotes price at time t for and for the first half hour of every hour, and otherwise.
Alternatively, we can separate (7) into two cases
The RU-MIDAS makes an underlying assumption that the error term for each half-hourly observation is independent. If we relax this assumption, we can re-structure the RU-MIDAS into a VAR framework of the form
where . We call this the RU-MIDAS-VAR model in this paper. (8) can easily be extended to incorporate more time lags and external variables. However, there is a key distinction to be made between this model and the standard VAR framework. In the standard VAR model, the second half hour of each hour i.e., only uses information up to . In other words, it is not affected by . In contrast, we can observe in (8) that is influenced by in the RU-MIDAS-VAR framework. The RU-MIDAS-VAR incorporates more updated information in the prediction process. In Section 3, we compare the forecasting capabilities of both models to assess the impact of more frequent updating.
The RU-MIDAS-VAR model we consider in this paper includes external variables, denoted by the vector , and so the model can be rewritten as
where is a column vector of all external variables, p is the total number of lags, and k is the number of external variables.
2.4. Variable Selection Techniques
2.4.1. LASSO
The least absolute shrinkage and selection operator (LASSO) of Tibshirani (1996) has seen popularity in scientific fields as a variable selection tool. It can be thought of as a generalized version of a linear regression, where instead of minimizing the sum of squared residuals, we introduce a penalty function to the optimization problem. The LASSO, while initially introduced for linear regression models, is easily applied to multivariate regression problems. Recall from (2) the concise notation for the VAR. The LASSO estimator is obtained by solving the optimization problem
where for is the jth parameter in the full parameter set, and is the LASSO tuning parameter. Note that for we obtain the least squares estimate. As increases, less significant parameters move to zero until all parameters are zero. In this way, we can determine an optimal value to suit our variable selection needs. In this study, we select several values according to the forecast metrics RMSE, MAPE and MAD. We perform the LASSO for the MF-VAR and RU-MIDAS-VAR models. The results are discussed in Section 3.
2.4.2. Adaptive LASSO
The adaptive LASSO of Zou (2006) is an evolution of the LASSO and is considered an oracle estimator. By adding an additional weight to the LASSO estimates, the adaptive LASSO is able to reduce the bias of the LASSO estimator. The adaptive LASSO estimator is obtained by solving the optimization problem
where for is the jth parameter in the full parameter set selected by the LASSO estimator, and is a weight assigned to that parameter. In this study, we choose the weights to be the the inverse of the absolute sum of the LASSO-estimated parameters, i.e., , where is the LASSO estimator described in (10).
2.4.3. Random Subspace Methods
Random subspace methods were introduced by Boot and Nibbering (2019) with an application to high-dimensional linear regression models. The idea of random subspace methods is to apply random weights drawn from some probability distribution, on the observation set. The resulting dataset is used to estimate parameters and provide predictions. This process is repeated with new weights each time, and the results are averaged to reduce the forecast variance while preserving most of the signal. Consider the model
where is considered the essential observation set, and will not be filtered, are essential regression coefficients, is the remaining observation set, and R is a matrix, where . Here denotes the number of regression coefficients in the ’non-essential’ parameter set and n denotes the number of ’non-essential’ parameters that will remain after dimension-reduction. The choice of R determines the type of random subspace method, and Boot and Nibbering (2019) suggests two methods for choosing R, the random subset regression and random projection regression.
Random Subset Regression
In random subset regression, a new subset of n parameters is chosen at random. Define an index and a scalar such that . Denote a vector with its th entry equal to one, the the random matrix R is chosen such that
Random Projection Regression
In random projection regression, a new set of predictors is created by taking weighted averages drawn from the Gaussian distribution. In this case, each entry of R is independent and identically distributed as
3. Data and Results
3.1. Description and Key Features
Our electricity price dataset consists of half-hourly electricity prices from the Otahuhu node in New Zealand from 1 January 2018, to 30 September 2020. The New Zealand electricity market is a real-time market, where electricity prices are determined by matching demand and supply every five minutes. The prices are determined as the highest bids made by electricity generators capable of clearing all demand. However, the five-minute prices are not accessible and are averaged to produce the half-hourly prices, which are made publicly available for analysis.
Figure 1 displays the electricity price time series. It is clear from this figure that half-hourly prices exhibit several key characteristics; extreme volatility, upward spikes, and mean-reversion. The figure also shows the impact of several key events. A consistent hydro shortage, due to a lack of rainfall in nearby regions, can significantly increase electricity prices until lake levels return to normal. In October 2018, a hydro shortage, with an unexpected outage in the largest gas field in New Zealand, saw prices consistently around $500/MWh.
Figure 1.
Half-hour electricity prices from Otahuhu node in New Zealand.
Figure 2a displays the autocorrelation of our price time series. The cyclical peaks that occur at lags of multiples of 48 indicate the intra-day seasonality behaviour. Since prices are largely influenced by demand, lower demand periods at night-time exhibit lower prices than during day-time. The intra-day seasonality can be further observed in Figure 2b, which displays the mean prices for every half-hour period in our dataset. This figure is a good depiction of the intra-day seasonality behaviour of electricity prices. There are two peaks during the day, the first at approximately 9:00 am as businesses open, and the second around 7:00 pm as residential demand increases. There is a large drop-off thereafter, as demand subsides during late hours. For forecasting purposes, we separate our price time series into four distinct periods; the morning peak from 7:00 am to 11:00 am, the evening peak from 6:00 pm to 10:00 pm, the midday off-peak from 11:00 am to 6:00 pm, and the night off-peak from 10:00 pm to 7:00 am. Figure 3 below provides our reasoning to make this distinction.
Figure 2.
Price auto−correlation and average price per trading period. (a) Auto−correlation of half−hourly prices. (b) Average price for each half−hour trading period.
Figure 3.
Box plot of prices for each trading period.
Figure 3 displays a box plot of half-hourly electricity prices for our sample period. The bottom and top edges of the box represents the 25th and 75th percentiles of data, while the red line inside the box represents the median price. The red crosses outside the box represents any outliers. Electricity prices exhibit a large quantity of outliers in the form of spikes that are observed in Figure 1. Furthermore, the size and intensity of the outliers are increased during peak price hours. As a result, the peak hours forecasts will have a higher priority than off-peak forecasts.
Our external market data consists of half-hourly demand for electricity in the North Island of New Zealand and several hourly weather variables including temperature, rainfall, air pressure, humidity, solar radiation, wind speed, and wind direction. Weather data are observed from the Auckland International Airport weather station by MetService. Table 1 presents descriptive statistics of price, demand, and weather variables.
Table 1.
Descriptive statistics of variables.
In this study, all variables are log-transformed to reduce the impact of outliers and to improve the normality of the data. Furthermore, the wind direction variables is first transformed using a sine transformation to emphasize its cyclical nature.
Table 2 presents the correlation coefficients between price and external variable time series. For the purpose of computing these correlations, we have taken the mean of half-hourly prices and the sum of half-hourly demand to convert them to hourly variables. We can observe instances where some weather variables have a strong correlation among each other, such as temperature and solar radiation, or wind speed and humidity. Regardless, we accept them as explanatory variables and later perform variable-selection techniques using the LASSO and random subspace methods in an effort to remove less significant explanatory variables and improve price forecasts.
Table 2.
Correlation coefficient table.
3.2. Forecast Results
In this section, we present the out-of-sample forecasting results for the various models and methods we have employed in this paper. We also separate the results for peak and off-peak periods. Since electricity prices behave much more erratically during peak periods, the peak forecast metrics are more relevant to our study than off-peak forecasts. We consider the mean absolute deviation (MAD), root mean squared errors (RMSE), and mean absolute percentage errors (MAPE) as forecast metrics. Our estimation data set consists of price, demand, and weather data from 1 January 2018, to 31 December 2019. We only consider data starting from 1 January 2018, because there has been a significant fundamental change in the behavior of the New Zealand electricity market beyond this time period, due to the ban of offshore oil and gas exploration, as well as major outages in significant gas fields in New Zealand. The year 2018 was a transitional period into a more renewable environment for the New Zealand electricity market, and a major step towards a fully renewable scheme. Our forecasting period is from 1 January 2020, to 30 September 2020. During the course of our work, global markets were impacted by the COVID-19 pandemic. However, the impact of COVID-19 on New Zealand electricity prices was minimal. We notice a period in April 2020, during the lockdown in New Zealand, where prices are lower than usual. The lockdown reduced nationwide industrial demand for electricity, as the majority of businesses were closed for this period. Other than this, there are no significant effects of the pandemic on electricity price itself, regardless of the impact on other aspects of the electricity market.
In particular, we conduct two separate studies. The first is a simple, single nine-month out-of sample forecast for the period from 1 January 2020 to 30 September 2020. This forecast is used to assess the long-term capabilities of the models. A second study incorporates a cross-validation scheme of several one-week forecasts, for the same time period. The purpose of this is to compare the short-term capabilities of the models. Even if long-term forecasting is challenging, near-term electricity price forecasts can provide significant practical benefits. These forecasts are crucial for real-time market operations, scheduling of maintenance, storage deployment, demand-side management, and financial trading. For this study, we perform a rolling estimation technique, so parameters are updated after every week. In the single out-of-sample nine-month forecast, we use forecasts of external variables as inputs, whereas in the cross-validation scheme, we use the realized values. In this way, we assure that no future information is used to make price forecasts. The forecast horizons used in this study are common practice in the industry.
Figure 4 shows the actual price density alongside the simulated densities from the MF-VAR and RU-MIDAS-VAR models. We observe that the MF-VAR can capture the lower and upper tails of empirical prices particularly well, however the mid portion of the density curve is not well-matched. On the other hand, the RU-MIDAS-VAR density finds a middle ground between actual and MF-VAR simulated median prices, however its tail approximations are worse than the MF-VAR model.
Figure 4.
Density plots of actual and simulated price.
Figure 5 displays the actual electricity price along with simulations generated from the MF-VAR and RU-MIDAS-VAR models. When running multiple simulations, we found that some spikes persistently appear in specific periods in our forecast horizon, suggesting that the weather variables have some impact on price spiking. However, we notice, in several periods, a large mismatch between simulated prices and actual prices. As we expected, there is a vast amount of information in the price data that cannot be explained by weather variables alone. The VAR framework with weather data is capable of capturing key characteristics, however, more data on market dynamics as well as macroeconomic variables, such as commodity prices, may improve our forecasts. Furthermore, we may be able to improve the simulated price density using heavy-tailed distribution within the VAR framework.
Figure 5.
Simulations of actual and model prices.
To study the capability of our forecasting models to replicate the mean-reversion and seasonality characteristics of electricity prices, we observe the half-life, and spectral density plots, respectively. The half-life values are derived from the lag 1 autocorrelation coefficient, and they denote the decaying rate of deviations from the long-term mean. Similar half-life values between different time-series’ indicate similar mean-reversion characteristics. Table 3 displays the AR(1) coefficient and corresponding half-life values for the actual prices and prices simulated from the MF-VAR and RU-MIDAS-VAR models with OLS estimates. The half-life is approximately 0.7 for all three time-series, indicating similar mean-reversion rates. The spectral density will show the distribution of variance for each frequency component in the time series. Peaks in the spectral density correspond to dominant frequencies, which are related to seasonality. Visualizing the spectral density plots and observing the correlation of the spectral densities between actual and simulated prices can help us identify the captured seasonality of the models. This visualization is provided in Figure 6. In this figure, we observe the logarithm of power spectral density to improve interpretability. We notice that the spectral density plots of actual and simulated prices are similar, indicating that the models are able to capture the seasonality of electricity prices. Furthermore, the correlation between the spectral densities of actual prices and the MF-VAR simulated price is 0.999, and the correlation between the spectral densities of actual prices and the RU-MIDAS-VAR simulated price is 0.997.
Table 3.
Half-life values.
Figure 6.
Spectral density plots of actual and simulated prices.
In Table 3, Table 4, Table 5, Table 6 and Table 7, MLS, MFVB, and GS refer to the multivariate least-squares, mean-field variational Bayes, and Gibbs sampling techniques, respectively. Table 4 shows the out-of-sample forecast results for the models and methods employed in this study. For Bayesian estimation, our experiment uses 100,000 iterations to obtain parameter estimates, where the first 10,000 samples are burned. We also perform thinning by selecting every fifth sample. From the remaining samples, we use the arithmetic mean to obtain parameter estimates, which are then used for forecasting purposes. The left table considers a model with lags of for each variable, whereas the right table shows results for lags of . For most estimation methods, the lower lag model performs better. The LASSO method performs significantly better for . Comparing the least-squares estimated for the MF-VAR and RU-MIDAS-VAR models, we observe that the MF-VAR model performs marginally better for both lag values. The MFVB and Gibbs sampler estimates are also similar to the least-squares estimates, with the Gibbs sampler performing better in terms of MAPE. The subset regression and projection regression methods provide better forecasts than least-squares for , but significantly underperform for .
Table 4.
Out-of-sample nine-month forecast results. Note: Bold font indicates best result.
Table 5.
Out-of-sample forecasts for peak and off-peak time periods for lag . Note: Bold font indicates best result.
Table 6.
Out-of-sample forecasts for peak and off-peaks time periods for lag . Note: Bold font indicates best result.
Table 7.
Diebold-Mariano test for comparing similarity in forecasts with different models. Note: Bold font indicates failure to reject the null hypothesis of no difference between forecasts with significance level .
We provide two LASSO estimates with different values so we may study their optimal performance for all forecast metrics. Generally, the LASSO results are similar for both and , since, for the latter case, the LASSO likely removes any insignificant, higher lag data. We also notice that, for , the LASSO identifies a parameter set that significantly reduces the MAPE compared to other methods. In particular, we find that the LASSO chosen coefficients include recent price, demand, and solar energy values. Wind speed and wind direction of beyond lag 10 are also relevant, indicating a delayed effect of wind power. Less prominent, but still included features are recent rainfall and humidity values. We find air pressure to hold no important information and it is mostly excluded in the parameter set. Furthermore, for the study of 240 lags, we find that price, demand, wind metrics, and rainfall can have hold information about current prices even beyond a week. However, solar, humidity, and air pressure do not hold significant information beyond the previous day. This means that all the significant variables play an important role in electricity price forecasting. Hence, accurate forecasts of these variables will also affect the accuracy of electricity price forecasts. Nevertheless, weather forecasts are beyond the scope of this study.
Table 5 and Table 6 separate the results of Table 4 into peak and off-peak forecasts for and , respectively. From prior analysis, it is clear that electricity prices exhibit larger volatility and more spike frequencies during peak periods. This is a direct result of higher demand and significant generation costs during peak periods. This also necessitates a larger focus on forecasting capabilities for peak periods, since there is much more risk involved for generators and retailers of electricity during these periods. Therefore, we would like to examine the peak period forecasting capabilities of our models. We notice that the forecasts are much better for the off-peak periods than the peak periods. We expected this since the peak periods exhibit much higher volatility and frequency of spikes. In general, the LASSO and adaptive LASSO models perform significantly better than the other methods, especially so in Table 6 for , where the MF-VAR LASSO significantly outperforms even the RU-MIDAS-VAR LASSO model. Again, the MF-VAR and RU-MIDAS-VAR least-squares estimates provide similar results, with the MF-VAR performing slightly better. An interesting result is that the MAPE for evening peaks is significantly lower than other periods of the day, despite the MAD and RMSE being higher. The reason for this is the MAPE has a more severe penalty for negative errors and the models are consistently overestimating evening prices. Finally, Table 8 displays the out-of-sample cross-validation results for rolling 1-week periods. We further separate cross-validation results into peak and off-peak periods. Note that, in some cases, only a single LASSO value is shown, since the LASSO value obtains the best result for every forecast metric. The forecast performance of the models relative to each other in the cross-validation context is similar to the fixed-window nine-month forecast, suggesting that our results are robust to changes in the forecast horizon. The raw values in cross-validation are lower than their nine-month forecast counterparts in Table 4, Table 5 and Table 6, since the former produces one-week forecasts while the latter produces a nine-month forecast. As suggested by a reviewer, technological factors of energy generation from different renewable sources may have a significant impact on electricity prices but we leave this for future research.
Table 8.
One-week cross-validation results for lags (left) and (right). Note: Bold font indicates best result.
Table 7 displays the results of the two-sided Diebold-Mariano (DM) test to compare the predictive accuracy of forecasts from two models for lag , see Diebold (2015) for details on the DM test. The null hypothesis of the DM test is that the two forecasts have similar predictive accuracy. Since this is a two-tailed test, a significance level suggests that the null hypothesis of no difference in forecasts will be rejected if the computed DM statistics fall outside the range of to . In cases where the forecasts are obtained multiple times, such as in Bayesian inference methods and random subspace methods, the test statistics are computed several times and averaged. We observe from the table that the MFVB, Gibbs sampler, MF-VAR LASSO, and RU-MIDAS-VAR LASSO obtain similar forecasts. This reflects the similarity between the two procedures in that a prior selection in Bayesian inference is a form of regularization. We also observe that the MLS, subset regression, and projection regression forecasts have similar predictions between their MF-VAR and RU-MIDAS-VAR variants, as do the LASSO and adaptive LASSO methods.
We also present the result of the Model Confidence Set (MCS) developed by Hansen et al. (2011). The MCS procedure contains a series of tests to construct a set of superior models, which contains the best model with a given confidence level. The test null hypothesis is that all model’s have equal predictive ability. The test is calculated for a specific loss function, which can be chosen depending on the specific forecasting taqsk. In our study, we choose the loss function to simply be the absolute errors. Furthermore, we conduct the test for a confidence level of . We have implemented the test using the MFE Toolbox in MATLAB, see Sheppard (2009). The MCS procedure selects the MF-VAR LASSO as the only model in the superior set of models for both lags of and , and the corresponding p-value of this test is 2 , indicating that the null hypothesis of equal predictive ability amongst all models is rejected at the 5% level.
4. Conclusions
The aim of this paper was to study the half-hourly electricity price forecasting capability of an MF-VAR model which incorporated weather data and demand. To our knowledge, this paper is the first to study the dynamics of intra-day electricity prices using the MF-VAR framework. Our forecasting results show that incorporating weather data and demand is capable of replicating several key characteristics observed in electricity prices; persistent volatility, spikes, and mean-reversion. Our out-of-sample density plots show that the MF-VAR model captured the lower and upper tails of actual price density well, however, there was a mismatch around median prices. The VAR is a capable framework for electricity price forecasting but additional data relating to market dynamics and macroeconomic factors may improve our forecasting capabilities.
Furthermore, we applied several techniques for estimating an MF-VAR framework. In particular, we used the mean-field variational Bayes as an approximation for Gibbs sampling estimates. Our results were promising and suggested that the MFVB provides impressive approximations to the Gibbs sampler estimates when applied to electricity prices in New Zealand, and within a VAR framework. The MFVB, therefore, proves to be a useful alternative Bayesian tool, particularly for large sets of data and parameters, due to the rapid computational time. In particular, while Gibbs sampling with 100,000 iterations took approximately 45 min, the VB ELBO converged in nearly 30 s, with a convergence threshold of . This was a significant improvement in computational time, with little different in the forecast results.
We also discussed an alternative mixed-frequency model, the RU-MIDAS model. We showed that easing the error restriction in this model allows us to specify the RU-MIDAS within a VAR framework, and we refer to it as the RU-MIDAS-VAR model. This paper is the first to study this relationship. Furthermore, we show that the RU-MIDAS-VAR closely resembles the MF-VAR, however, it uses more up-to-date information for its lags. Despite this, our forecast results show that the MF-VAR and the RU-MIDAS-VAR provide similar price predictions, with the MF-VAR performing marginally better.
We also incorporate the LASSO, adaptive LASSO, and random subspace methods as variable selection tools for our models. Our forecast results show that the LASSO performs extremely well for both the MF-VAR and RU-MIDAS-VAR models within the New Zealand electricity market. The LASSO and adaptive LASSO outperform all other methods in nearly all metrics, particularly so when a larger parameter set is used. The LASSO also allows us to identify the optimal parameter set for a specific criterion. The random subspace methods as dimension reduction techniques performed better than least-squares estimates for a larger parameter set, however, they did not perform well otherwise. They may still have useful practical applications for studies with large datasets and parameter sets.
We employed a two-sided Diebold-Mariano test on the full set of models. The test showed that the predictions of the MFVB, Gibbs Sampler, MF-VAR LASSO and adaptive LASSO, and RU-MIDAS-VAR LASSO and adaptive LASSO models significantly outperform the other models, however there was not a significant difference in prediction accuracy amongst these models. On the other hand, our results from the Model Confidence Set (MCS) suggested the MF-VAR LASSO as the only model in the superior set of models for both lags and .
Finally, we extended our forecasting studies by observing forecasts for specific periods in a day, i.e., peak and off-peak periods. The models generally perform better for off-peak periods, as one would expect, since peak periods exhibit much higher volatility and spike frequency. However, to adjust for this, it may be beneficial to test heavy-tailed distribution such as the generalized Pareto distribution or to incorporate stochastic volatility within a VAR framework. In the future, we hope to extend this study to incorporate more useful electricity market and macroeconomic variables, which may improve price predictions. Another important aspect for future work is to compare electricity price forecasts across markets.
Author Contributions
Conceptualization, N.W.; Methodology, G.K. and N.W.; Software, G.K.; Validation, G.K., N.W. and M.L.; Formal analysis, G.K.; Writing—original draft, G.K.; Writing—review & editing, N.W., M.L. and W.Z.; Supervision, N.W.; Funding acquisition, N.W. All authors have read and agreed to the published version of the manuscript.
Funding
This project was supported by the R&D Fellowship Grant from Callaghan Innovation (Grant Number GENED1801).
Data Availability Statement
Data will be made available on request.
Acknowledgments
We thank the Associate Editor and the reviewers for their valuable comments and feedback that greatly improved our manuscript.
Conflicts of Interest
Author Gaurav Kapoor was employed by the company Jetstar Airways (Australia). The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
References
- Bańbura, M., Giannone, D., & Reichlin, L. (2010). Large bayesian vector auto regressions. Journal of Applied Econometrics, 25(1), 71–92. [Google Scholar] [CrossRef]
- Bishop, C. M., & Nasrabadi, M. (2006). Pattern recognition and machine learning (Vol. 4). Springer. [Google Scholar]
- Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518), 859–877. [Google Scholar] [CrossRef]
- Boot, T., & Nibbering, D. (2019). Forecasting using random subspace methods. Journal of Econometrics, 209(2), 391–406. [Google Scholar] [CrossRef]
- Carvalho, C. M., Polson, N. G., & Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika, 97(2), 465–480. [Google Scholar] [CrossRef]
- Cavalcante, L., Bessa, R. J., Reis, M., & Browell, J. (2017). Lasso vector autoregression structures for very short-term wind power forecasting. Wind Energy, 20(4), 657–675. [Google Scholar] [CrossRef]
- Cross, J. L., Hou, C., & Poon, A. (2020). Macroeconomic forecasting with large bayesian vars: Global-local priors and the illusion of sparsity. International Journal of Forecasting, 36(3), 899–915. [Google Scholar] [CrossRef]
- Diebold, F. X. (2015). Comparing predictive accuracy, twenty years later: A personal perspective on the use and abuse of diebold–mariano tests. Journal of Business & Economic Statistics, 33(1), 1. [Google Scholar]
- Eraker, B., Chiu, C. W., Foerster, A. T., Kim, T. B., & Seoane, H. D. (2014). Bayesian mixed frequency vars. Journal of Financial Econometrics, 13(3), 698–721. [Google Scholar] [CrossRef]
- Foroni, C., Ravazzolo, F., & Rossini, L. (2023). Are low frequency macroeconomic variables important for high frequency electricity prices? Economic Modelling, 120, 106160. [Google Scholar] [CrossRef]
- Gefang, D. (2014). Bayesian doubly adaptive elastic-net lasso for var shrinkage. International Journal of Forecasting, 30(1), 1–11. [Google Scholar] [CrossRef]
- Gefang, D., Koop, G., & Poon, A. (2020). Computationally efficient inference in large bayesian mixed frequency vars. Economics Letters, 191, 109120. [Google Scholar] [CrossRef]
- Gefang, D., Koop, G., & Poon, A. (2023). Forecasting using variational bayesian inference in large vector autoregressions with hierarchical shrinkage. International Journal of Forecasting, 39(1), 346–363. [Google Scholar] [CrossRef]
- Ghysels, E. (2016). Macroeconomics and the reality of mixed frequency data. Journal of Econometrics, 193(2), 294–314. [Google Scholar] [CrossRef]
- Ghysels, E., Santa-Clara, P., & Valkanov, R. (2004). The midas touch: Mixed data sampling regression models. UCLA: Finance. Available online: https://escholarship.org/uc/item/9mf223rs (accessed on 15 July 2022).
- Götz, T. B., & Hauzenberger, K. (2021). Large mixed-frequency vars with a parsimonious time-varying parameter structure. The Econometrics Journal, 24(3), 442–461. [Google Scholar] [CrossRef]
- Hansen, P. R., Lunde, A., & Nason, J. M. (2011). The model confidence set. Econometrica, 79(2), 453–497. [Google Scholar] [CrossRef]
- Hsu, N., Hung, H., & Chang, Y. (2008). Subset selection for vector autoregressive processes using lasso. Computational Statistics & Data Analysis, 52(7), 3645–3657. [Google Scholar]
- Huurman, C., Ravazzolo, F., & Zhou, C. (2012). The power of weather. Computational Statistics & Data Analysis, 56(11), 3793–3807. [Google Scholar]
- Jedrzejewski, A., Marcjasz, G., & Weron, R. (2021). Importance of the long-term seasonal component in day-ahead electricity price forecasting revisited: Parameter-rich models estimated via the lasso. Energies, 14(11), 3249. [Google Scholar] [CrossRef]
- Kapoor, G., & Wichitaksorn, N. (2023). Electricity price forecasting in New Zealand: A comparative analysis of statistical and machine learning models with feature selection. Applied Energy, 347, 121446. [Google Scholar] [CrossRef]
- Koop, G., & Korobilis, D. (2010). Bayesian multivariate time series methods for empirical macroeconomics. Foundations and Trends in Econometrics, 3(4), 267–358. [Google Scholar] [CrossRef]
- Koop, G., & Korobilis, D. (2018). Variational bayes inference in high-dimensional time-varying parameter models. Available online: https://mpra.ub.uni-muenchen.de/id/eprint/87972 (accessed on 1 August 2022).
- Litterman, R. B. (1986). Forecasting with bayesian vector autoregressions—Five years of experience. Journal of Business & Economic Statistics, 4(1), 25–38. [Google Scholar]
- Ludwig, N., Feuerriegel, S., & Neumann, D. (2015). Putting big data analytics to work: Feature selection for forecasting electricity prices using the lasso and random forests. Journal of Decision Systems, 24(1), 19–36. [Google Scholar] [CrossRef]
- Marcjasz, G., Uniejewski, B., & Weron, R. (2020). Beating the naïve—Combining lasso with naïve intraday electricity price forecasts. Energies, 13(7), 1667. [Google Scholar] [CrossRef]
- Messner, J. W., & Pinson, P. (2019). Online adaptive lasso estimation in vector autoregressive models for high dimensional wind power forecasting. International Journal of Forecasting, 35(4), 1485–1498. [Google Scholar] [CrossRef]
- Mosquera-López, S., Uribe, J. M., & Manotas-Duque, D. F. (2017). Nonlinear empirical pricing in electricity markets using fundamental weather factors. Energy, 139, 594–605. [Google Scholar] [CrossRef]
- Ormerod, J. T., & Wand, M. P. (2010). Explaining variational approximations. The American Statistician, 64(2), 140–153. [Google Scholar] [CrossRef]
- Schorfheide, F., & Song, D. (2015). Real-time forecasting with a mixed-frequency var. Journal of Business & Economic Statistics, 33(3), 366–380. [Google Scholar]
- Sheppard, K. (2009). Mfe matlab function reference financial econometrics. Available online: https://www.kevinsheppard.com/files/code/matlab/mfe-toolbox-documentation.pdf (accessed on 22 March 2022).
- Sims, C. A. (1980). Macroeconomics and reality. Econometrica: Journal of the Econometric Society, 48, 1–48. [Google Scholar] [CrossRef]
- Son, H., & Kim, C. (2017). Short-term forecasting of electricity demand for the residential sector using weather and social variables. Resources, Conservation and Recycling, 123, 200–207. [Google Scholar] [CrossRef]
- Suomalainen, K., Pritchard, G., Sharp, B., Yuan, Z., & Zakeri, G. (2015). Correlation analysis on wind and hydro resources with electricity demand and prices in new zealand. Applied Energy, 137, 445–462. [Google Scholar] [CrossRef]
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288. [Google Scholar] [CrossRef]
- Uniejewski, B., Marcjasz, G., & Weron, R. (2019). Understanding intraday electricity markets: Variable selection and very short-term price forecasting using lasso. International Journal of Forecasting, 35(4), 1533–1547. [Google Scholar] [CrossRef]
- Uniejewski, B., & Weron, B. (2018). Efficient forecasting of electricity spot prices with expert and lasso models. Energies, 11(8), 2039. [Google Scholar] [CrossRef]
- Wand, M. P., Ormerod, J. T., Padoan, S. A., & Frühwirth, R. (2011). Mean field variational bayes for elaborate distributions. Bayesian Analysis, 6(4), 847–900. [Google Scholar] [CrossRef]
- Weron, R. (2014). Electricity price forecasting: A review of the state-of-the-art with a look into the future. International Journal of Forecasting, 30(4), 1030–1081. [Google Scholar] [CrossRef]
- You, C., Ormerod, J. T., & Mueller, S. (2014). On variational bayes estimation and variational information criteria for linear regression models. Australian & New Zealand Journal of Statistics, 56(1), 73–87. [Google Scholar]
- Ziel, F. (2016). Forecasting electricity spot prices using lasso: On capturing the autoregressive intraday structure. IEEE Transactions on Power Systems, 31(6), 4977–4987. [Google Scholar] [CrossRef]
- Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418–1429. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).