Estimating Portfolio Value at Risk in the Electricity Markets Using an Entropy Optimized BEMD Approach

: In this paper, we propose a new entropy-optimized bivariate empirical mode decomposition (BEMD)-based model for estimating portfolio value at risk (PVaR). It reveals and analyzes different components of the price ﬂuctuation. These components are decomposed and distinguished by their different behavioral patterns and ﬂuctuation range, by the BEMD model. The entropy theory has been introduced for the identiﬁcation of the model parameters during the modeling process. The decomposed bivariate data components are calculated with the DCC-GARCH models. Empirical studies suggest that the proposed model outperforms the benchmark multivariate exponential weighted moving average (MEWMA) and DCC-GARCH model, in terms of conventional out-of-sample performance evaluation criteria for the model accuracy.


Introduction
As an indispensable industry input nowadays, electricity is an integral part of the modern economy.With the deregulation movement reforming the electricity industry since the 1970s, worldwide electricity markets have become more competitive and integrated.There are higher levels of interactions and price fluctuations observed in the market place [1].The designs of more accurate risk measurement models have become the central issue for more effective risk management in the electricity market [2].Among different risk measures developed over the years, portfolio value at risk (PVaR) is one of the widely-accepted downside risk measures.Given the confidence level and holding horizon for a portfolio, it is the single statistic summarizing the maximal downside risk [3].Its estimation usually relies on the analysis of data movement and fluctuation characteristics.
Recently, numerous empirical studies have provided evidence of the co-existence of extremal events and normal events underlying the risk movement, potentially driven by investors with different investment preferences [4][5][6].One of the popular attempts to address this issue is the regime switching-based models [7].For example, Huisman and Mahieu [7] and Zhang and Zhang [8] used the Markov regime switching model to identify three regimes for the crude oil market [8].Shen and Holmes [9] identified two stationary regimes for 12 Asia pacific country stock market and showed the different rate of price mean reversion for two regimes [9].Explicit breaks in risk behavior are usually assumed in the regime switching-based approach.However, these models do not take joint influences of different underlying components into consideration and cannot achieve robust superior out-of-sample forecasting performance.For example, Samitas and Armenatzoglou [10] found that the Markov regime switching model provides inferior performance to the regression tree model in the electricity market [10].Chen and Bunn [11] found that Markov Regime Switching (MRS) may overfit the data and provides insufficient level of explanatory and forecasting power [11].Recently, the emergence of multiscale analysis (MA) theory provides an important alternative and more flexible modeling framework, with different assumptions from the MRS model.It is assumed that the underlying data generating processes (DGPs) of different characteristics coexist across different scales.It further proposes methodologically that different models can be used to characterize data governed by different laws across different scales [12].It recognizes the multiple spatial-temporal scale organization and non-equilibrium dynamics in the data, often mixed together and distinguishable by their different scales and other characteristics [12].MA theory aims for efficient representation of multiscale problem with a diverse range of data features characterized by their scales, following the microscopic approach [13].However, to adopt the MA theory during the risk measurement modeling in the electricity market, there are two open problems, i.e., the selection of an appropriate technique to conduct the multiscale analysis and the identification of the appropriate model specifications.
As for the development of the appropriate techniques for multiscale analysis, some frequency and time scale models have been utilized to investigate the multiscale risk structure using traditional Fourier transform and wavelet transform.Empirical studies using these methods have reported positive performance [14].Meanwhile, the recently emerging empirical mode decomposition (EMD) takes an empirical, intuitive, direct and self-adaptive data approach as the alternative [15,16].It is mainly used in the forecasting field and has shown positive performance.For example, An et al. [17] combined a multi-output FFNN (feed-forward neural network) with EMD-based signal filtering and seasonal adjustment.Results demonstrated that the proposed model improves the forecasting accuracy noticeably compared with the existing models [17].Dong et al. [18] proposed a forecasting model that detached high volatility and daily seasonality for electricity price based on EMD.The comparisons demonstrate that the proposed model can improve the prediction accuracy noticeably [18].However, we have not identified the researches exploiting the multiscale analysis capability of the EMD algorithm in the multivariate portfolio risk measurement.
As for the identification of appropriate multiscale model specifications, the determination of model specifications, such as the decomposition scales in the EMD model, remains another theoretical challenge in the literature.Information criteria (IC), such as AIC and BIC, are widely used in the literature for model specification identification.IC is defined by goodness of fit of the model with the log likelihood function penalized by the number of parameters representing the model complexity.However the log likelihood function is not well defined for the multiscale analysis, especially for the recently-emerging BEMD model.The entropy measure serves as a potential alternative approach.It is shown in the literature that both the entropy and likelihood approaches reach the same optimization results with different model assumptions.The entropy measure can be used directly to quantify and measure the historical data patterns, often referred to as the information content of the predictor.In the literature, the entropy measure is usually used to measure the information distribution in the wavelet domain, For example wavelet entropy, relative wavelet entropy and many other variants have been proposed in the literature to calculate the entropy of the energy distribution in the typical wavelet decomposition, as well as the cost function for the best basis algorithm to choose the optimal basis for wavelet packet transform [19,20].A lower wavelet entropy value implies more orderly data.Xu et al. [21] used the modified wavelet entropy measure to differentiate between the normal and hypertension states [21].Samui and Samantaray [22] incorporated the wavelet entropy measure in constructing the measuring index for islanding detection in distributed generation [22].
To tackle these theoretical and methodological challenges, in this paper, we propose an entropy measure to quantify the forecasting accuracy using the in-sample data and identify the appropriate model specifications in the multiscale analysis.We also propose an entropy-optimized bivariate empirical mode decomposition (BEMD)-based methodology to study risk evolutions and estimate PVaR in the electricity markets.We assumed that the market price is influenced by a number of underlying data components.We used the EMD methods to decompose the original price series into the chosen number of data components and separate them into both normal and transient components.The noise part is assumed to be caused by transient and extreme events, while the normal part is assumed to reflect the normal market environment.PVaR for both parts is estimated.We use the in-sample performance measure, such as entropy, to evaluate the performance of normal components under different separation scenario and choose the one with the maximum entropy value.Empirical studies in the benchmark electricity markets confirm the statistically-significant out-of-sample performance improvement using the more appropriate multiscale model specification with the proposed EMD entropy method, in terms of reliability and accuracy.The reliability is defined by equality of the probability of violations during the chosen time period with the expected level of violations for VaR models, while the accuracy is defined by how closely the VaR estimated tracks the realized losses [23].
The main contribution of this paper is the proposition of an entropy-based methodology for the identification and determination of data components using BEMD methods, if the price fluctuations are assumed to be the result of different influencing factors.This approach is unique in two aspects.Firstly, the BEMD algorithm is introduced to decompose the bivariate portfolio data into its two constituent bivariate components, which are assumed to represent normal and transient market risks.Secondly, different from the previous approaches to using entropy theory to measure the information distribution in the wavelet domain, the proposed model uses the entropy to measure the information content of the predictors in the EMD decomposed domain and to determine the appropriate levels for both data and extreme event components in the BEMD algorithm during the in-sample model tuning process.This approach measures the information extracted with the proposed model, instead of the historical information in the original data, and, thus, can be used to determine the appropriate model specifications.The work in this paper provides further empirical evidence of multiscale data characteristics, which are distinguishable by distinct investment patterns characterized by scales in the BEMD algorithm.To the best of our knowledge, the work in this paper represents an exploratory attempt to determine the appropriate EMD model parameters using the quantified information in the prediction data.
The rest of the paper proceeds as follows: Section 2 provides a brief account of BEMD theories and proposes the entropy-optimized BEMD-based methodology for estimating PVaR.In Section 3, empirical studies are conducted in the Australia electricity markets, and experimental results are reported to justify the superiority of the proposed algorithm.Section 4 provides some concluding remarks.

Bivariate Empirical Mode Decomposition
Empirical mode decomposition (EMD) was initially proposed for the study of ocean waves and then was successfully applied in biomedical engineering, structured health monitoring and other natural science and engineering areas [24,25].Compared with Fourier and wavelet analysis, EMD offers much better temporal and frequency resolutions [15].EMD can adaptively decompose a time series into several independent intrinsic mode function (IMF) components at different scales (levels) and one residual component.The fluctuations within a time series are automatically and adaptively selected from the time series.EMD is a data-driven method with very few assumptions; thus, it can be used for data series of a nonlinear and non-stationary nature [16].
The idea of EMD is to separate the data based on their different scale characteristics defined as the fluctuation band.In the univariate case, this is achieved by defining the fluctuation band as the distance between local extrema at each level.The EMD is viewed as identifying high frequency with fast oscillation from lower frequency with slow oscillation.In the bivariate case, the fluctuation band is defined as the edge of a tightly-enclosed tube.the notion of oscillation is extended to the notion of rotation.The BEMD is viewed as identifying fast rotation from slow rotation [26].
The bivariate time series are expressed as the complex-valued signal z(t) = xi + y.For any complex-valued signal z(t), BEMD involves the following steps: (1) Choose the number of directions k to calculate the envelope curve.
(3) Calculate the maxima of p φ j (t).Extract the locations t j i .
(4) Interpolate the set (t j i , z(t j i )) using the chosen curve fitting algorithm, such as the cubic spline algorithm, to generate the envelop curve on the direction φ j : e φ j (t).
(7) Subtract the mean from the original signal z(t).The sifting elementary operators are defined as The original bivariate time series can be expressed as the sum of some IMFs at different scales and a residue.

Entropy Optimized BEMD Algorithm for PVaR Estimation
To implement the BEMD-based multiscale methodology, we make some further simplifying assumptions as follows: 1. We assume that the electricity market is influenced by market agents with different defining characteristics, such as normal and transient data characteristics.These market agents contribute equally to the market price movement and risk (fluctuation) level during the market price formation process.They are mutually independent, as their correlations and covariance are much smaller in scale and ignorable.
2. We classify these characteristics into main groups, including investment strategies, time horizons and investment scales.The market is assumed to be dominated with some main investment strategies, stable at a particular scale over the period of analysis, with finite fluctuation bands at certain boundary values at each scale.
With these assumptions, the market structure can be approximated with the combination and mixture of DGPs at different scales.Since different models of the underlying DGPs exist for the observed market price movement, it represents an identification problem of the redundant representation of the original data during the modeling process.Empirical research in the current literature usually uses the in-sample fitting accuracy to measure the generalizability of the model.However, the well-known overfitting issue with this approach indicates that the accuracy of this approach is prone to the data time window and the uncertainty in the representation, i.e., models other than the correct one could also lead to the same in-sample error.
In this paper, we argue that the level of information content of the potential candidate models for the observed model performance could serve as the alternative measure for the model's generalizability.We propose the entropy theory to quantify and measure the level of information content of the observed model performance for the potential model candidates.The chosen model is supposed to describe the underlying DGPs associated with the level of information content that corresponds to the expected level for the defined problem domain.For example, for the identification of transient and extreme events, we would expect a higher level of information content than that in the normal situation.As the aim of this paper is to identify the BEMD model separating the transient and extreme events appropriately, we propose the maximum entropy criteria to select the appropriate model specifications among the potential models.
Based on the aforementioned theory for the electricity market, we propose the entropy-optimized BEMD-based algorithm to model multiscale market structure and estimate PVaR.
Firstly, the BEMD algorithm is used to decompose the training data into different scales or levels.We observe that an extreme event has the following characteristics, i.e., low probability of occurrence, significantly less in number and significant impact on the risk level.Thus, we assume that only one level or scale corresponds to the extreme event.The total market risk level is influenced by normal market risk and extreme event risk, assumed to be equal.Since the extreme event risk level is time-varying in nature and sensitive to the market data characteristics, it needs to be determined during the training process.
Secondly, the time varying variance and covariance for the normal market level and the extreme behavior level are modeled with the multivariate GARCH approach.During the modeling process, we assume that the decomposed components are stationary.Stationarity tests, including Kwiatkowski-Phillips-Schmidt-Shin (KPSS) , the augmented Dicky-Fuller test, etc., can be employed to test their stationarity [27].This assumption is also realistic, as the EMD decomposed components are confined in the fluctuation boundary, which implies that the variance is finite and the mean is constant.Typical models include Baba-Engle-Kraft-Kroner (BEKK) , dynamic conditional correlation (DCC) and constant conditional correlation (CCC) GARCH models [28].Among them, the DCC-GARCH model is the most popular one, because it relaxes the fixed correlation assumptions in the other two approaches.Therefore, in this paper, we use the DCC-GARCH model [29,30].
The conditional mean matrix is defined as in r t |F t−1 ∼ N (0, H t ), where r t is returns from k assets and can be assumed to follow some time series models, such as the vector autoregressive moving average (VARMA) model or the vector autoregressive (VAR) model [31].H t is the covariance matrix and is defined as H t ≡ D t R t D t .R t is the correlation matrix at time t.D t is the standard deviations matrix with elements √ h it at the i-th diagonal.The element h it , i = 1, . . ., k can be assumed to follow the univariate GARCH model as (1).
where P i and S i are the lags of the GARCH model for each individual asset i.Parameters α and β need to satisfy the stationary condition P i p=1 α i,p + S i s=1 β i,s < 1.The dynamic conditional correlation matrix R t of ε t at time t is defined as in (2).
where Q * t is a diagonal matrix whose elements are the square root of diagonal elements in Q t .Q t is N × N the symmetric positive definite unconditional correlation matrix of ε t as in (3).
where ε t ∼ N (0, R t ) are the standardized residuals.M and N are lags for the correlation specification.Both parameters α and β need to be positive and need to satisfy the stationary condition M m=1 α m + N n=1 β < 1.
Although the stationary condition is defined in both the univariate GARCH model and DCC model in the DCC GARCH model, the sufficient and necessary conditions for covariance stationarity in the DCC GARCH model have not been well defined and have remained a particularly challenging research problem in the literature, with very little progress achieved so far [32].Recently, Fermanian and Malongo [33] proposed the some formulation for the necessary and sufficient conditions [33].However, Engle [29] showed that a consistent parameter estimate can be achieved for the current formulation of the DCC GARCH model under standard regularity conditions and should suffice in a wide range of circumstances [29].
Thirdly, since we further assume that the normal market behavior and transient market behaviors are not correlated, the aggregated risk can be reconstructed from the individual risk estimates at each state.We further assume that risk at each state contributes equally to the aggregated risk level; the variance-covariance matrix is simply the equal average of the individual variance-covariance matrix forecast at each state as in (4).
where t is the aggregated conditional variance-covariance forecast matrix at time t .
N B,t and T B,t are conditional variance-covariance forecast matrices for both returns at both normal and extreme phenomenon, respectively, at time t.t = Cov(X t , Y t ) is the covariance matrix at time t, Cov(X N B,t , Y T B,t ) = 0 and Cov(X T B,t , Y N B,t ) = 0.
Fourthly, suppose a unit worth portfolio investment with ω weights and one day holding period with the forecasted conditional mean and covariance matrices; we follow the traditional variance-covariance approach for estimating P V aR as in (5) [3].
where α = 1 − cl.t refers to the variance-covariance matrix.ω t refers to the weight matrix for the portfolio.Z α refers to the relevant quantile from the standard normal distribution.P is the invested portfolio value.h is the holding period.
Fifthly, we calculate different measures for the predictors to determine the model specifications.We assume that the correctly-specified models can extract the maximal amount of information from the original data.These measures include the number of the exceedances, the p-value for the Kupiec backtesting procedure and MSE, as well as the entropy measure, to determine the optimal parameter for the BEMD used.The number of exceedances refers to the number of times the realized loss exceeds the PVaR threshold.Minimization of the number of exceedances implies that the PVaR estimated provide a maximum level of coverage for the market risk.The maximization of the p-value implies that the number of exceedances corresponds to the theoretical value at the given confidence level.We propose the maximum entropy criteria to choose the scales to be retained.The entropy maximization corresponds to the maximization of information content in the predictors.Given the predicted random variable Y ∈ R n , generated with the unknown data generating process (DGP) with unknown parameters, and the observation Y ∈ R n , the Shannon entropy of predictor is defined as (6).
where H( Y ) refers to the Shannon entropy of the predictor Y and p(x) refers to the probability density function (pdf).The objective is to maximize the H( Y ) of the measurable function Y by adjusting different parameters of forecasting algorithm that produces Y .Sixthly, with the chosen parameters for the BEMDPVaR, we repeat the previous steps to make one step ahead forecasts for the test dataset, using the rolling windows method over the entire test dataset.

Empirical Studies
We conducted empirical studies using the average daily electricity prices in two representative sub-markets, i.e., New South Wales (NSW) and Queensland (QLD) in the Australian electricity market, known as the Australian Energy Market Operator (AEMO).AEMO belongs to the first group of deregulated markets, where the pricing and risk management mechanism are more developed and established.We obtained the data from AEMO, who made publicly available the transaction and operation data of electricity market.The dataset covers the period from 1 January 2004 to 12 April 2015.Since there is no consensus on the division of the dataset, either in the machine learning or econometric literature [34], when dividing the dataset, we follow the common criteria that reserves at least 70% data as the training set and retains a sufficiently large size of the test set for the results to be statistically valid [35].The dataset is divided into three sub-datasets, i.e., the training set for the proposed BEMDPVaR model (49%), the model tuning set to determine the model parameters (21%) and the test set for the out-of-sample test to evaluate the out-of-sample performance of different models (30%).The size of the window is 2015, covering the relevant information available.We assume one dollar equal holding position for initial investments in each market.We assume a one-day holding period and use the one day ahead rolling window method to conduct one step ahead PVaR estimations.The out-of-sample performance of different models is evaluated using the reliability measure including the number of exceedances, the p-value for the Kupiec backtesting procedure and the mean square error (MSE).
Table 1 lists the descriptive statistics for both electricity markets.1, p i , i ∈ (N SW, QLD) and r i , i ∈ (N SW, QLD) refer to the price and return in the i market, respectively.The electricity markets are subject to frequent and abrupt external shocks.Descriptive statistics in Table 1 show that the electricity prices in both markets significantly deviate from the normal distribution, as confirmed by the four statistical moments, as well as the rejection of both the Jarque-Bera (JB) test of normality and the Brock-Dechert-Scheinkman (BDS) test of independence [36,37].Electricity prices in both markets exhibit significant fluctuations.The fluctuations in the NSW market are significantly higher than those in QLD, subject to different regional pricing mechanisms.As the autocorrelation and partial autocorrelation functions indicate the trend factors, the daily prices were log differenced at the first order to remove them as in y t = ln pt p t−1 .We further calculated the descriptive statistics on the return data.The distribution of the return data appears to approximate the normal distribution, as indicated by the four moments.The kurtosis appears to deviate from the normal level, which indicates that the market exhibits a significant abnormal event.Besides, since the null hypotheses of both the JB and BDS tests are rejected, this further indicates that the market return contains unknown nonlinear dynamics, not easily captured by traditional linear models.
The MEWMA and DCC-GARCH models are chosen as the benchmark models following the convention in the literature.The generalizability of the proposed model is evaluated using both risk coverage and predictive accuracy measures.More specifically, the Kupiec backtesting procedure is chosen to test for the unconditional risk coverage.The Kupiec likelihood ratio test statistic conforms to the χ 2 (1) distribution.MSE, defined as T −1 T i=1 ε 2 i where ε i = y i −y i , is used to evaluate the predictive accuracy.We fix the lags to one for the VARMA(1,1)-DCC-GARCH(1,1) model used, since it suffices for most of the situations in the empirical studies and represents the most parsimonious form.
Following the proposed numerical procedure, firstly, we conducted the experiments to determine the model specifications and parameters.The in-sample performance evaluation results of different models using the reliability measures, including exceedances and the corresponding p-value measuring its statistical significance, as well as the MSE as the accuracy measure and the entropy as the information criteria measure, are listed in Table 2.In Table 2, scale refers to the level of decomposition in the BEMD algorithm, n refers to the normal distribution, t refers to the t distribution, N refers to the mean value of the number of exceedances at different confidence levels, including 95%, 97.5% and 99%, P refers to mean value of the p-value of the Kupiec backtesting procedure at different confidence levels, M SE refers to mean value of MSE at different confidence levels and Entropy refers to the mean value of entropy of the prediction errors at different confidence levels.
Altogether, we identified 11 IMFs.The results in Table 2 show that with each choice of IMF as the transient and extreme factors, there are performance variations.The objective is to find the optimal choice of transient factors.Using the model tuning dataset, the optimal level for the model with normal distribution assumption is 7, 11, 4, 11, using exceedances, p-value, MSE and entropy measures, respectively.The optimal level for the model with the t distribution assumption is 7, 3, 4, 4 respectively.We found that the model with the t distribution would provide much better in-sample performance than the model with the normal distribution.Thus, we would adopt the t distribution as the underlying distribution for the VaR estimates.
With the chosen model specifications and parameters through the in-sample model tuning test, we further conducted the out-of-sample experiments to evaluate the out-of-sample performance of the proposed model.The results are listed in Table 3 The results in Table 3 show that in general, the proposed BEMDPVaR algorithm improves the reliability upon the traditional MEWMA and DCC-GARCH model.This is supported with the higher p-value from Kupiec backtesting procedure and lower MSE values for the models with both the normal and t distribution assumptions.The optimal out-of-sample performance is achieved when the model parameters are optimized with entropy measure at Level 4, and the t distribution is used as the model assumption.When using either the normal distribution and t distribution as the model assumption, exceedances, the p-value and MSE do not provide consistent superior out-of-sample performance.Using the p-value as the measurement criteria, we choose the optimal model parameters with the highest out-of-sample performance for the normal distribution assumption, while using MSE as the measurement criteria, we choose different sets of optimal parameters for the t distribution.Out-of-sample performance comparisons for the proposed BEMDPVaR using different criteria during the model parameter determination in-sample suggest that using the proposed entropy measure leads to improved model performance out-of-sample when both the normal and t distributions are assumed.
Out-of-sample performance improvement of the proposed algorithm, especially concerning the predictive accuracy aspects, is attributed to the identification of the transient or extreme factors using the entropy-optimized BEMD.This implies that the electricity price data movement is a complicated process with a mixture of underlying DGPs of different natures.It has a multiscale structure, far more complicated than what the mainstream models assume.Meanwhile, there are redundant representations of the underlying latent structure.As there is a lack of explicit and analytic solutions to determine the appropriate latent structure for the multiscale model specifications in the literature, significant model risk would result.We found that the model risk can be reduced significantly with the appropriate choice of optimization criteria, i.e., entropy measure in this paper, during the model tuning process.Thus, the determination of the multiscale model specifications holds the key to the further out-of-sample performance improvement and more thorough understanding of the DGPs during the modeling process.Some innovative methodologies, such as entropy theory, BEMD and the like, enable the analysis of the mixed data structure during the process.

Conclusions
In this paper, we propose the entropy theory to identify the BEMD model specifications and construct an effective PVaR estimation algorithm based on that.The superior out-of-sample performance of the proposed algorithm is supported by out-of-sample performance evaluation in the empirical studies in major Australian electricity markets.This is attributed to the identification and the incorporation of the multiscale structure of the high-dimensional time series.The bivariate time series is decomposed into several meaningful components in different frequency domains, so that the normal factors and extreme factors are separated to analyze and estimate PVaR.
Results obtained in this paper have some important implications that contribute to the relevant literature.Firstly, the proposition of the maximum entropy criteria shows that model specification determination is a nontrivial task, which has significant impacts on the result interpretation and forecasting performance.A diverse range of accuracy measures can be developed and used, where MSE and entropy measures represent a rather restricted set of them.The optimization objective using these measures needs to be constructed based on the problem domain knowledge.Analysis in major electricity markets provides the additional empirical evidence of the multiscale market structure, revealing the chaotic and multiscale behaviors.This provides further supporting evidence for the MA methodology and Heterogeneous Market Hypothesis (HMH) theoretical framework.The work in this paper represents an exploratory attempt.Secondly, improved risk measurement reliability would result from the incorporation of the multiscale investment strategies for portfolios across different markets.New multiscale techniques, such as BEMD, are essential in terms of the appropriate separation of this multiscale market structure, i.e., data and extreme components in the portfolio, with their distinct behavioral and correlation patterns.Overall, the work in this paper represents an initial exploratory attempt in modeling the multiscale electricity market structure using the MA methodology, which paves the way for in-depth investigation of a more diverse range of data features available in the electricity markets.
) Test S B [z](t) for the conditions of bivariate IMF.If it qualifies, repeat the above steps on z(t) − S B [z](t).Otherwise, repeat the above steps on S B [z](t).

Table 1 .
Descriptive statistics and statistical tests.

Table 2 .
In-sample performance measure comparison of different models.

Table 3 .
Predictive accuracy comparison of different models with the t distribution.