1. Introduction
Around the world, after the wave of liberalization of energy markets in the 1990s, organized electricity markets such as those in Texas (USA), California (USA), the Nordic countries (Europe), and Alberta (Canada) emerged. Thus, electricity is sold on the electricity wholesale market hourly, and market forces such as demand and supply decide the price of electricity. Hence, commodity traders can buy and sell electricity and generate a profit (or incur a loss) when trading in wholesale markets. This is not limited to electricity; natural gas and crude oil markets have also been liberalized since the 1990s. When organizations trade electricity, natural gas (commodities) etc., on these markets, traders place their trades based on their experience of the movement of energy prices and current market affairs. In such an environment, simulating the electricity load, wind power generation, and natural gas prices into the future enables traders to look into the future in a probabilistic manner. Thus, this encourages traders to make their decisions in a more informed manner. Hence, the simulation of the above process is important for commodity traders. When simulating the above processes, it is also important to look into the correlation of the above processes. Stakeholders in energy markets believe each process influences other processes in the market. Thus, the objective of this study was to develop simulations for the electricity load, wind power generation, and natural gas prices for an energy market using stochastic differential equations (SDEs) and incorporate correlation into the simulations. To achieve this objective, the researchers introduced a novel stochastic copula model in which the correlation structure is incorporated into the OU model. The study is a collaboration between the University of Calgary, the Mitacs Accelerate Program, and Cascade Trading Ltd. Cascade Trading Ltd. provided the data and necessary assistance for this study. Based on the interest of the above organization, the Texas Energy market was chosen for this study.
Electricity is considered a commodity among research scholars and commodity traders. The classical model for commodities was introduced by
Schwartz (
1997), in whose work commodity prices were modeled using an exponential mean-reverting process.
Lucia and Schwartz (
2002) modeled the electricity price process of the Nordic energy market using an extension to the
Schwartz (
1997) model. The extension was conducted using a one-factor mean-reverting model and a two-factor mean-reverting model. Incorporating jumps into the
Schwartz (
1997) model,
Cartea and Figueroa (
2006) used a mean-reverting jump-diffusion model to create a model of electricity prices in the England and Wales energy market.
Coulon et al. (
2013) used a Brownian-driven mean-reverting OU process to model the electricity load of the Texas energy market.
Benth et al. (
2008) proposed a Lévy-driven electricity price model for Nordic markets, and
Benth et al. (
2018) modelled the electricity spot price using a normal inverse Gaussian process. Following
Benth et al. (
2018),
Warunasinghe and Swishchuk (
2024) modelled electricity spot prices using a mean-reverting Ornstein–Uhlenbeck (OU) process, which is driven by the variance gamma Lévy process and the normal inverse Gaussian Lévy process for the Alberta, Canada, energy market.
Extensive literature is available on wind power and wind speed modelling. A few popular approaches among the research scholars are to characterize the wind speed or wind power using a probability density function (PDF) of the Weibull distribution, using autoregressive moving average (ARIMA) time-series models, the implementation of artificial neural networks in combination with ARIMA models, and the use of stochastic differential equations (SDEs) and mean-reverting models. According to
Jaramillo and Borja (
2004),
Zárate-Miñano et al. (
2013), when modelling wind speed using a PDF of the Weibull distribution, it is not effective for shorter time scales like hourly or 10-minute time intervals. The reason is that Weibull PDF is a static distribution.
Hill et al. (
2012) and
Karki et al. (
2006) used the ARIMA model to create a model of hourly wind-speed data in the United Kingdom and Canada, respectively.
Rodríguez et al. (
2021) implemented the autoregressive fractionally integrated moving average (ARFIMA) on Colombian data. Researchers such as
Li et al. (
2018) and
Wang et al. (
2017) introduced novel hybrid neural networks with ARIMA models to forecast wind speed in China. For short time intervals, researchers have used SDE and mean-reverting models. The researchers
Loukatou et al. (
2018),
Benth et al. (
2018), and
Arenas-López and Badaoui (
2020) used the Brownian motion-driven OU process to model the stochastic component of wind speed and wind power. There has also been an increase in the popularity among research scholars using volatility models such as generalized autoregressive conditional heteroskedasticity (GARCH) models to model wind power forecasting. Researchers such as
Chen et al. (
2019) used asymmetric GARCH models such as Glosten–Jagannathan–Runkle GARCH (GJR-GARCH), Asymmetric Power Autoregressive Conditional Heteroskedasticity (APARCH), and Exponential Generalized Autoregressive Conditional Heteroskedasticity (EGARCH) to model and forecast the wind-power generation of eastern China.
Shen and Ritter (
2016) used five different types of GARCH models and compared them with the Markov regime-switching GARCH model to forecast the volatility of wind-power production for the German energy market. They identified that the Markov regime-switching GARCH model outperformed classical GARCH models.
Most research scholars used the
Schwartz (
1997) SDE to model natural gas prices.
Clewlow (
2000) and
Pilipović (
1998) incorporated a jump term into the above SDE to capture the spikes of the natural gas price process. Diverting from this path,
Benth and Šaltytė Benth (
2004) modeled a deseasonalized log of the natural gas price process using an NIG-driven mean-reverting OU process. In the last decade, there has been the development of statistical models and machine learning used to model and forecast natural gas prices. The research scholars
Nguyen and Nabney (
2010) and
Saghi and Rezaee (
2021) used a wavelet transformation with different statistical models, artificial neural networks, and the GARCH model to forecast natural gas prices, and they observed that linear models outperformed non-linear models.
Čeperić et al. (
2017) highlighted that prior variable selection is important for artificial neural networks and support vector machines. They also identified that, if not, a classical time-series model outperforms the above models.
A copula is a powerful tool to capture the dependencies between random variables. Recently, copulas have been utilized to capture dependencies in time-series models in both univariate and multivariate cases. Thus, it can be used to model more complex dependent structures.
Tankov and Cont (
2003) highlighted the importance of the copula when modeling multivariate models.
Tankov and Cont (
2003) calibrated two exp-Lévy models driven by the variance gamma process, compared the results when the dependent structure is a Lévy copula versus when the dependent structure is modeled via correlated Brownian motions, and identified that the two dependent structures are widely different. The researchers
Hu (
2006) and
Jondeau and Rockinger (
2006) implement copulas to capture the dependence patterns across financial markets around the world.
Aloui et al. (
2014) used copula-GARCH to capture the dependence structure between crude oil and natural gas for portfolio risk management, while
Berrisch et al. (
2023) used copula-GARCH to capture the dependence structure between European carbon and energy prices. For a survey on the use of copulas in finance, refer to
Genest et al. (
2009) for more details.
This paper contributes to the literature by modeling the electricity load, wind power generation, and natural gas prices for the Texas energy market and implementing a copula–OU model to incorporate the dependent structure between the three processes into the simulation based on the OU process. The following is the structure of the paper:
Section 2 discusses the data handling and preliminary analysis of the study.
Section 3 discusses the modeling of the electricity load, wind power, and natural gas prices, and their model-fitting performance is discussed in
Section 4. The calibration of the copula and the integration of the copula into the simulations (copula–OU model) are discussed in
Section 5, and following that,
Section 6 concludes the results of the study and provides future work expanding upon the study.
2. Data Description
The study was carried out on the ERCOT (Electric Reliability Council of Texas) energy market based on the daily electricity load, wind power generation, and natural gas prices. The hourly values of wind power and the electricity load were converted into daily values by taking the average, while natural gas prices were only available on a daily time scale. The data were collected from 1 January 2019 to 31 May 2024. The electricity load, measured in megawatts (MW), and wind power generation, measured in megawatts (MW), were obtained from the ERCOT energy market, and natural gas prices, measured in USD per million British thermal units (USD per MMBtu), were obtained from the Henry Hub natural gas market.
Figure 1 and
Figure 2 plot the daily average electricity load and daily average wind power generation, while
Figure 3 plots the daily natural gas prices, which allows us to observe the stylized features of the time series.
When the wind power generation for ERCOT in
Figure 2 is observed, it can be noticed that there is a slight linear positive trend. This can be due to the following reasons: the first is an increase in wind power production at existing power plants, the second is the addition of new wind farms to the ERCOT electricity grid, and the third is both reasons. Thus, a transformation is required to discount the impact of new power plants in the wind power series. Hence, researchers considered the wind power generation percentage over the actual wind power production;
Shen and Ritter (
2016) utilized a similar transformation of data. The wind power generation percentage elaborates on the efficiency of all available wind farms in ERCOT. The total capacity of the wind power plant in the ERCOT market is obtained from
www.ercot.com. The time series plot of wind power generation percentage is displayed in
Figure 4, and it can be observed that the positive linear trend observed in
Figure 2 has changed to a negative linear trend. The wind power generation percentage (WPG%) is calculated using the following equation:
When observing the time series plot for natural gas prices, it can be noticed that there are two large spikes in the series around February 2021 and in January 2024. The reason for these two spikes is as follows: in February 2021, storm Uri occurred, which caused a historic winter in the state of Texas. The storm Uri event occurred during 11–20 February 2021, and many counties in the state of Texas experienced wind chills below zero. Uri was the coldest winter storm for the State of Texas since December 1989. Due to this storm, heating demands in Texas increased rapidly, causing natural gas prices to reach new heights. In January 2024, another freeze event occurred in the State of Texas due to a winter storm, thus increasing demand for heating, which drove the price of natural gas above normal price ranges. Due to these two rare events, we observed outliers in the natural gas price data. These outliers needed to be handled before the calibration process, and the observed outliers were smoothed out using the six-month moving average.
Figure 5 displays the smoothed natural gas price process. A comparison of the model parameters of natural gas prices between pre-outlier removal and post-outlier removal is displayed in
Table A1 in
Appendix A.
After the above modifications were made to the data, the study was conducted for the electricity load, smoothed natural gas price, and percentage of wind power generation in this study. The parameter estimation of the model and model training were performed on the data from 1 January 2019 to 31 December 2023, and the trained model was tested, while performance was measured against the data from 1 January 2024 to 31 May 2024. From here onward, we refer to the electricity load process as the demand process, the smoothed natural gas price process as the gas price process, and the wind power generation percentage process as the wind power process.
3. Deterministic and Stochastic Models Description
This section presents the setup of individual models to accommodate the seasonal, trend, and stochastic properties of the demand, wind power, and gas price processes. The modeling approach for the above-mentioned processes is as follows:
Decompose seasonal, trend, and stochastic components from the process.
Model seasonal and trend (deterministic) components with truncated Fourier series (TFS).
Model the stochastic component with a suitable stochastic model.
In this section, however, the modeling of the dependence structure between processes will not be discussed, and this will be discussed in
Section 5.
3.1. Seasonal and Trend Model
By following the framework of
Lucia and Schwartz (
2002), the process was modeled as follows:
where
represents either demand, wind power, or the natural gas price,
represents the deterministic or seasonal and trend components of the process, and
represents the stochastic component of the process. The deterministic component is modeled using a TFS of the style below:
where
,
, for
,
and
are constant parameters, and
n is the order of TFS.
represents the linear trend in the deterministic component, and the
and
components represent the seasonality of the process. The order
n of TFS is determined via the elbow point (beyond the particular
n, it has no significant improvement) of the plot between the residual sum of squares (RSS) of the model for different levels of order (
n). The reason for selecting the above model for the deterministic component is the periodic pattern. This can be seen in
Figure 6,
Figure 7 and
Figure 8 in the following section, and it is well documented in the literature, for example, in studies conducted by
Benth et al. (
2008) and
Meyer-Brandis and Tankov (
2009). The decomposition of seasonal, trend, and stochastic components of the three processes is displayed in the figure below.
For the demand and wind power process, a linear trend and a periodic seasonal pattern are observable. Thus, the two components are added together and modeled by the TFS in Equation (
3). Observing
Figure 9 and
Figure 10 reveals that the elbow point for the demand process is
, while for the wind power process, it is not evident. In the wind power process, the RSS is high for the TFS for all orders, and it is decreasing rapidly with the increase in order. It can be expected to decrease the RSS further if the order is increased beyond
. However, this is computationally expensive; thus, the optimal order for the TFS is considered to be
. Even though the elbow point is
for the demand process, during modeling, an order of
was considered to obtain better performance. Hence, orders
and
are considered for the TFS of the demand and wind power process, respectively.
Contrary to demand and the wind power process, the trend in the gas price process is non-linear and displays a periodic pattern almost similar to a sine curve. Hence, seasonal and trend components are modeled separately in the gas price process. The seasonal component is modeled using Equation (
3), while for the trend component, the TFS was modeled with the style below:
The elbow point for both seasonal and trend TFS models is not evident when observing
Figure 11 and
Figure 12. When considering the seasonal TFS model, the lowest RSS is observable at order
, while for the trend TFS model, the lowest is observed at
and
. Thus, for the seasonal and trend models of the gas price process, the order of the TFS was considered
and
, respectively. A sensitivity analysis of how changes in the order of TFS affect the natural gas price simulation is presented in
Appendix A Table A2 and
Table A3.
Let us discuss the trend component of the gas price process more. Even though it can be observed that, for the period of the study, the trend exhibits a sine pattern, it is not the case when comparing data from January 1997 to May 2024.
Figure 13 displays the decomposition of seasonal, trend, and stochastic components of gas prices for the above period.
When
Figure 13 is observed, it can be seen that the trend components do not exhibit the sine pattern displayed in the period of January 2019 to May 2024 and do not show a consistent pattern. The reason behind this might be that, unlike demand and wind power, weather is not the only factor influencing gas prices. Thus, using TFS to model the trend component for gas prices is not ideal. A possible solution is to use a regression-type model in modeling gas prices. However, for the seasonal component, this is not the case, as it can be observed that a consistent pattern is visible throughout the period; thus, modeling using TFS is ideal.
Figure 14 reveals that the trend component is the major component in the value of the gas price process, following the stochastic component and the seasonal component. Thus, the trend model has a high impact on the overall simulation.
Figure 15 and
Figure 16 display the TFS with
for the training period and the out-of-sample period. Even though, for the training period, TFS has been able to capture the pattern, since the pattern is not consistent, as in
Figure 13, for the out-of-sample period, TFS performs poorly. This highlights the need to apply a regression-based model for the deterministic component of the gas price process. However, this is considered as a future work of the current study, and thus, for modeling purposes, the current study continued to implement the TFS for the deterministic component of gas prices.
3.2. Stochastic Model
Next, the mean-reversion property or stationary assumption needs to be validated for the stochastic component or the de-seasonalized component of the three processes. To validate the stationary assumption, two statistical tests are deployed, namely the augmented Dickey–Fuller (ADF) test and the Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test. The hypothesis of the ADF test is as below:
Hypothesis 1. The process is non-stationary.
Hypothesis 2. The process is stationary.
Meanwhile, the hypothesis test for the KPSS test is as follows:
Hypothesis 1. The process is trend-stationary.
Hypothesis 2. The process is non-stationary.
Table 1 displays the results of the two statistical tests for the three processes. The results support the conclusion that the stochastic components of all three processes are stationary.
Since all three processes are mean-reverting processes, several types of Ornstein–Uhlenbeck (OU) processes can be modeled for the stochastic component of the processes. A generalized OU process and a jump-diffusion (JD) model are expressed in the equation below:
where
is the mean-reverting speed of the process,
represents the long-run level of the process or the deterministic level of the process,
represents the volatility of the process,
represents the Brownian motion, where it is also a Lévy process, and
represents the jump size that follows a normal distribution with the mean
and the variance
, and
is a Poisson process with the jump intensity
. When the historical data for the demand process and the wind power process in
Figure 1 and
Figure 4 are observed, it can be noticed that the process moves around the seasonal and trend components. This can also be observed in
Figure 6 and
Figure 7. Thus, a modification to the OU process is implemented, following
Benth and Saltyte Benth (
2007). The modified Ornstein–Uhlenbeck (M-OU) process is expressed in the equation below:
Benth and Saltyte Benth (
2007) used the above to model temperature data, which has similar style characteristics to demand and wind power processes. For the above M-OU process, when the jump component is added similarly to Equation (
7), it can be expressed as follows:
where this model can be expressed as a modified jump–diffusion (M-JD) model. When the JD model or the M-JD model was calibrated, jumps in the data were identified by the following recursive filtering procedure mentioned in
Benth et al. (
2008). A summary of the recursive procedure is as follows: First, identify jumps higher than the mean level plus three times the standard deviation. Then, remove those data jumps, and recalculate the mean and standard deviation. Again, identify the jumps for the new mean and new standard deviation, and remove those jumps from the data. The process is repeated until no jumps are found in the data. The summation of all jumps is the total number of jumps in the data, and the final standard deviation will be the diffusion rate (
) for the JD or M-JD model. Further, divide the jumps into positive jumps and negative jumps, and calculate the jump probabilities separately. Thus, our jump component will be a summation of negative jumps and positive jumps. For more details, please refer to
Benth et al. (
2008).
The stochastic component for demand and wind power processes is modeled using multiple models: an M-OU process, as expressed in Equation (
5), an M-JD process, as in Equation (
8), and a Lévy-driven M-OU process where the diffusion component (
) is replaced with a Lévy process (
). The Lévy process considered for the demand process is a normal inverse Gaussian (NIG) process, while for the wind power process, it is NIG and exponential Weibull processes. Apart from that, four different Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models are considered when modeling wind power processes and compared against the mean-reverting models. GARCH models are commonly used in the literature as volatility models. The GARCH model is formulated as follows:
where
C is a constant,
k is the order of an autoregressive term,
is the autoregressive coefficient, and
is the error term:
where
is an independent and identically distributed random variable with a mean of 0, a variance of 1, and
volatility. The volatility
is defined as follows:
where
is a constant,
is an ARCH term, and
is a GARCH term. The different GARCH models calibrated in this study are as follows:
A GARCH model with a normal distribution.
A GARCH model with Student’s T distribution.
A GARCH model with a skewed Student’s distribution.
A GJR-GARCH model:
where
if
, and
if
.
TARCH model:
where
if
,
if
, and volatility is
.
When it comes to modeling the gas price process, the considered models were as follows: an OU process, a JD process, and an OU process driven by an NIG Lévy process. The rationale for not considering the modified mean-reverting models was that, unlike demand and wind power processes, the gas price process does not revolve around the seasonal mean of the processes. For all models mentioned above, the mean-reversion coefficient is considered to be deterministic.
3.3. Volatility
During the study, the volatility (
) was considered to be time-dependent. The time-dependent volatility can be calibrated following two approaches: first, using hourly data to calculate daily volatility, and second, using daily data to calculate monthly volatility. Both approaches were tested against the demand process, and it was concluded that the second approach was ideal; thus, for the remaining processes, volatility was modeled monthly for the mean-reversion OU process. The details of this are available in the next section.
Figure 17,
Figure 18 and
Figure 19 display the monthly volatility for the three processes, and
Figure 20,
Figure 21 and
Figure 22 display the volatility by month and year.
4. Performance of Stochastic Models
This section discusses and evaluates different models from the previous section based on the performance of the models for out-of-sample simulations. The out-of-sample simulation is conducted on the daily average demand process, wind power, and daily gas price process, and it considers the period of 1 January 2024 to 31 May 2024 as out of sample.
1 For each model, simulations are plotted against the actual data, and the performance is measured using the mean absolute error (MAE) and the mean absolute percentage error (MAPE). For each model, 1000 out-of-sample simulations are performed, and the MAE and MAPE are computed for each. The average MAE and MAPE for each model are then calculated. The model with the lowest average MAE and MAPE error is considered the best model to simulate the three processes independently. A similar procedure was followed throughout the study when comparing the performance of model simulations using MAE and MAPE. In the case of the gas price process, since TFS is not ideal to model the seasonal and trend component, the models from the previous section are tested against the gas price process as well, with a consideration of only the stochastic component of the process.
The demand process is modeled using Equation (
7), and two volatility modeling approaches are compared, i.e., modeling daily volatility versus modeling monthly volatility.
Figure 23 displays the simulation from Equation (
7) with daily volatility, and
Figure 24 displays the same with monthly volatility. It can be observed that the simulated data revolves around the seasonal and trend components due to the modification applied to the OU process. However, the simulated data involves a significant level of noise in the data compared to the actual data, while the simulation with monthly volatility involves similar noise levels to the actual data. The simulations in
Figure 24 using monthly volatility exhibit similar patterns to those of the actual data in the figure. When the average MAE and MAPE of 1000 simulations are compared between two simulations in
Table 2, it can be seen that the model with monthly volatility performs better compared to the model with daily volatility. When MAE is compared, this difference is approximately 600 MW. Thus, using monthly volatility as
is more advantageous for the simulations compared to using daily volatility. Thus, for all processes, monthly volatility was used as the volatility for the OU processes. When the MAE and MAPE of the M-JD model, the M-OU process driven by NIG, and the M-OU model with monthly volatility were evaluated, the M-JD model exhibited better fitting performance for the demand process compared to the other models.
Figure 25 displays the out-of-sample simulation for the M-JD model and the demand process, along with the actual data.
When the MAE and MAPE of GARCH models and the OU processes for the wind power process are compared, it is seen that both types of models achieve similar model-fitting performance. However, the GARCH models show slightly better model-fitting performance over the OU processes. Out of all GARCH models, the GARCH models driven by normal distribution and Student’s T distribution show better performance, while in the mean-reverting models, the M-JD process displays better model-fitting performance compared to the others.
Figure 26 displays the out-of-sample simulation for the GARCH (normal) and GARCH (Student’s T) models, and
Figure 27 displays the same for the M-JD model.
In the gas price process, the OU process driven by the NIG process achieves significantly better model-fitting performance compared to the OU process with Brownian motion. The OU (NIG) achieves only a USD 0.61 average deviation between the true and simulated gas prices, while M-OU achieves about a USD 1.30 average deviation. The JD model achieves superior performance compared to both OU and OU (NIG). When only the stochastic component for the gas price process is considered, again, the JD model achieves superior performance compared with the OU process and the OU process driven by the NIG process.
Figure 28 and
Figure 29 display the out-of-sample simulation obtained from the JD model for the gas price process.
6. Conclusions and Future Works
We can arrive at the following conclusion based on the results obtained so far in the study. First, it can be concluded that the mean-reversion models can capture the movement of demand, wind power, and natural gas prices processes in the Texas energy market. Secondly, we can model the seasonality and trend components of the demand and wind power processes through the truncated Fourier series for the same energy market. However, the same success from the truncated Fourier series did not hold for the gas price process. To incorporate the correlation structure between the above processes, the vine copula is a suitable mechanism. As we saw in the above sections, the vine copula successfully captures the correlation between the three processes. We can incorporate the vine copula into the SDE by replacing the diffusion component of the OU process with the calibrated vine copula. This provides successful simulations for demand and wind power and the stochastic component of the natural gas price process. We believe a limitation of the copula-OU models is that, when rare events occur that are driven beyond the correlation-dependent structure between variables (e.g., the 2021 storm Uri and the 2024 freeze event), the copula-OU model will be unable to match the magnitude of the event in the simulation. This may be the reason why the JD model for the natural gas process and the M-JD model in the demand process achieve a slightly better MAE and MAPE compared to those of the copula-OU model. However, the advantage of the copula-OU model lies in the ability to provide correlated simulations, which are more realistic compared to stand-alone simulations. In future work based on this study, as discussed in
Section 3.1, a parametric and semi-parametric regression-based model that incorporates macroeconomic and weather variables would be able to capture the deterministic (trend) component more robustly. Hence, firstly, a study should be conducted on identifying the factors that influence the gas price process and modeling the deterministic component using a regression-based model. Secondly, it is also worth exploring other methodologies, such as STL and wavelet-based approaches, to decompose and model deterministic components. In our study, the order of the truncated Fourier series for wind was 30, which may lead to overfitting and a higher computation cost. We leave this work to future iterations of the study. The study should extend and develop SDEs to perform simulations at an hourly time scale in order to better reflect the energy market. Apart from that, the simulations should be regional; thus, future studies should focus on developing the SDEs regionally for demand and wind power. Finally, the simulations should incorporate not only the correlation between processes but also the regional correlation as well. By regional correlation, we imply the correlation between the demand of the east and north regions or the correlation between the wind power of the north and south regions of the Texas energy market, etc. The
Figure 40,
Figure 41 and
Figure 42 display the predefined regions in the Texas energy market by ERCOT. Finally, the study should explore the possibility of incorporating Lévy copulas into the simulation; this adjustment will improve simulations that are conducted at the regional level.