A Finite Mixture GARCH Approach with EM Algorithm for Energy Forecasting Applications

: Enhancing forecasting performance in terms of both the expected mean value and variance has been a critical challenging issue for energy industry. In this paper, the novel methodology of ﬁnite mixture Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) approach with Expectation–Maximization (EM) algorithm is introduced. The applicability of this methodology is comprehensively evaluated for the forecasting of energy related time series including wind speed, wind power generation, and electricity price. Its forecasting performances are evaluated by various criteria, and also compared with those of the conventional AutoRegressive Moving-Average (ARMA) model and the less conventional ARMA-GARCH model. It is found that the proposed mixture GARCH model outperforms the other two models in terms of volatility modeling for all the energy related time series considered. This is proven to be statistically signiﬁcant because the p-values of likelihood ratio test are less than 0.0001. On the other hand, in terms of estimations of mean wind speed, mean wind power output, and mean electricity price, no signiﬁcant improvement from the proposed model is obtained. The results indicate that the proposed ﬁnite mixture GARCH model is a viable approach for mitigating the associated risk in energy related predictions thanks to the reduced errors on volatility modeling.


Introduction
Volatility prediction is a major consideration for energy-related processes or variables, which include, but are not limited to, oil prices, energy consumptions, electricity prices, energy generations from traditional and renewable sources, and meteorological variables such as wind speed. In recent years, renewable energy has become increasingly costcompetitive [1]. In terms of levelized cost of generating electricity, the median costs of solar PV and onshore wind generations are lower than those of gas and coal generations in the United States, China, Europe, and India. Most notably, the generation of wind energy continued to grow in 2019 and 2020 despite the COVID-19 pandemic caused challenges [2,3]. As such, the volatility prediction of renewable energy processes becomes increasingly important in that it can mitigate the challenges stemming from the supply of intermittent power to market and thus facilitate the healthy and sustainable development of renewable energy [4].
In the prediction process of many energy variables, one usually considers two aspects: the expectation (mean) and the variance. Take wind speed forecasting as an example: wind

Brief Literature Review
There are many data-driven approaches for modeling and predicting energy related variables, such as statistical models, neural networks, generalized impulse response analysis methods, and hybrid approaches [5][6][7][8][9]. These approaches are usually for mean prediction. Meanwhile, the generalized autoregressive conditional heteroscedastic (GARCH) models have been developed to model the non-constant-volatility/heteroskedasticity [10]. The heteroskedasticity generally implies that different time series observations have nonconstant variance. However, in the commonly used ordinary least square (OLS) estimation, the presence of heteroskedasticity becomes a challenge because OLS estimation assumes constant variances. In this case, GARCH method becomes a promising tool to deal with time series data with time varying volatility. For various purposes, GARCH models have been extended into different forms, such as nonlinear GARCH (NGARCH), exponential GARCH (EGARCH), and the GARCH-in-mean (GARCH-M) model [11][12][13][14][15]. GARCH model applications have appeared in various areas. In particular, the popularity of GARCH models has been increasing in energy related studies. For instance, Garcia et al. [16] propose a GARCH model for electricity prices. The results can be further used to develop bidding strategies or negotiation skills in the electricity market. Liu et al. [17] develop an approach to estimate the wind power generation by modeling wind speed volatility and thus the operation probability of wind turbines. The proposed approach is valuable not only for the Energies 2021, 14, 2352 3 of 22 management of wind farms, but also for the integration of wind energy in the electricity market. Sun et al. [18] adopt ARMA-GARCH models for forecasting of solar radiation, and show that the ARMA-GARCH models can outperform other forecasting models without the consideration of volatility.
However, GARCH models cannot properly handle the data with heavy tailed or asymmetric distributions (e.g., non-normal distributions). To address this, the finite mixture GARCH approach has been developed, and it has received much attention from academia and industry in recent years. The original concept of finite mixture model was proposed for two normal probability density functions [19]. However, it is until last two decades before that this methodology was adopted in serious applications due to the tremendous advancement of computing power [20]. To date, the existing studies for finite mixture GARCH models are concentrated in financial related areas. Tang et al. [21] indicate that the finite mixture ARMA-GARCH model is more effective than either the mixture of AR models or AR-GARCH models. Hossain and Nasser [22] compare the performance of finite mixture ARMA-GARCH with those of neural network and support-vector machines (SVM) approaches for financial time series. The results indicate that the finite mixture ARMA-GARCH model outperforms the other two methods in terms of directional symmetry (DS) and weighted directional symmetry (WDS). Haas et al. [23] propose a MN-GARCH model for the daily return data on a stock index. The empirical analysis suggests good performance of the MN-GARCH model for both in-sample fit and out-of-sample forecasting. Similarly, Alexander and Lazar [24] apply the general normal mixture GARCH(1,1) for exchange rate modeling. The preliminary results reveal that the two-component mixture GARCH(1,1) model outperforms the mixture models with three or more components, as well as the symmetric and skewed student's GARCH models. Broda et al. [25] propose an approach that combines GARCH model with the mixtures of Paretian distributions. The approach is then applied to model seven major FX and equity indices, and it turns out to be more effective than the traditional GARCH-type models. Besides, other significant works include the mixture asymmetric GARCH model for option pricing [26], and a class of mixture GARCH model for capturing volatility and periodicity in non-linear time series data [27].
Although the mixture GARCH models are effective modeling tools for time series data, the major drawback is the existence of large number of parameters, which renders the process of model fitting rather difficult. One way for effectively estimating parameters in finite mixture models could be the expectation-maximization (EM) algorithm. Usually, the EM algorithm is comprised of two steps, namely, the expectation step (E-step) and the maximization step (M-step). The E-step provides the expectation on the missing information based on the conditional distribution of observations, while the M-step provides the maximum likelihood estimates (MLE) to the parameters based on the observations and expectation of the missing information. There are numerous applications of the EM algorithm, and particularly significant efforts have been made to fit the Gaussian mixture models by using various versions of EM algorithms [28][29][30][31]. The well-known drawbacks of EM algorithm lie in the local convergence and initialization, but recent studies have addressed the problem by combining new techniques, such as k-means, greedy learning, unsupervised learning, and genetic algorithm, with the EM algorithm [32][33][34][35].
The mixture GARCH model is also able to be fitted with the EM algorithm. However, the related studies are relatively scarce. Nikolaev et al. [36] use the EM algorithm and mixture GARCH model for estimating the volatility of financial returns, where AR(1) is applied for mean equation and Student's t distribution is used as the component for the mixture model. Cheng et al. [37] combine the normal mixture GARCH model with the EM algorithm for S&P500 Index and Hang Seng Index forecasting, where AR(1) is also assumed for the mean equation. In addition, Wu and Lee [38] apply the EM algorithm and the normal mixture GARCH model to study the excess market returns, where no mean equation is considered. Tang et al. [21] extend the mixture AR-GARCH model to ARMA-GARCH model, which is fitted with EM algorithm and applied for the stock price forecasting. However, the mean equation is also considered as a mixture component, and this arrangement is lack of support from the mixture GARCH theory.
Although the literature shows potentials for financial applications of finite mixture GARCH models, the effort for adopting the methodology on energy related subjects has been lacking. More importantly, the combination of finite mixture GARCH approach with EM algorithm has not been attempted to the best of our knowledge. As such, the proposed research helps to bridge the research gap.

Foundation of GARCH Model
It is well known that GARCH approach is effective for modeling the time varying volatility. It was first developed by Bollerslev [10] as an extension for ARCH model. Consider the error term ε t from ARMA(p, q) model, where a i , i = 1, · · · p and b j , j = 1, · · · q are the coefficients for the auto regressive and moving average terms, respectively. Similarly, p and q are the orders of autoregressive and moving average terms, respectively. If the error term ε t has a time varying variance, it can be modeled as, where z t is a white noise process with mean 0 and variance 1, and can be formulated by the following equation, where α 0 is a constant term, α i , i = 1, · · · P and β j , j = 1, · · · Q are the coefficients for the GARCH and ARCH terms, respectively. Similarly, P and Q are the orders of GARCH and ARCH terms, respectively. In this way, the error term follows the GARCH process of order P and Q, denoted as GARCH(P, Q). When P = 0, the current conditional variance is not dependent upon the previous conditional variance, and this means that the GARCH model degenerates into an ARCH model.

Gaussian Mixture Model
A Gaussian mixture model is a weighted mixture of finite Gaussian component densities. Assume that a random variable X follows Gaussian/Normal Mixture (NM) distribution with k components, i.e., X ∼ N M w 1 , · · · w k ; µ 1 , · · · µ k ; σ 2 1 , · · · σ 2 k , the probability density function of X is, where w i is the weight for the ith Gaussian component with w i ∈ (0, 1) and is the probability density function of the ith Gaussian component represented as φ i (x) = φ x; µ i , σ 2 i , and µ i and σ 2 i are the mean and variance for the corresponding Gaussian distribution.

Finite Mixture GARCH Model
The finite mixture GARCH model borrows the idea of both GARCH model and Gaussian mixture model. Given the error term ε t from ARMA(p, q) model, the finite mixture GARCH model assumes that ε t follows a Gaussian mixture distribution, i.e., ε t ∼ N M(w 1 , · · · w K ; µ t,1 , · · · µ t,K ; σ 2 t,1 , · · · σ 2 t,K ), where w k is the weight for the kth Gaussian component with w k ∈ (0, 1) and K ∑ k=1 w k = 1. Moreover, the concept borrowed from GARCH model implies that the variance of the kth component σ 2 t,k can be formulated as follows, where α 0,k is a constant term, α i,k , i = 1, · · · P and β j,k , j = 1, · · · Q are the coefficient for the GARCH and ARCH terms, respectively. Similarly, P and Q are the orders of GARCH and ARCH terms, respectively. Hence, by combining the above mentioned technology, the finite mixture GARCH(P, Q) can be formulated as follows, where p(y t ) is the probability density function of y t , and σ 2 t,k is the variance of the kth component at time t.

EM Algorithm for Estimating Parameters in Finite Mixture GARCH Models
As mentioned previously, in this study we adopt the EM algorithm to find parameters of finite mixture GARCH models that maximize the log-likelihood function. The joint probability of the finite mixture GARCH model is and the optimal parameters that maximize the log-likelihood function need to be found, where θ represents the set of parameters in the model. The two steps of EM algorithm, namely, E-step and M-step, are described below for the finite mixture GARCH model fitting.

E-step
Let Z = {Z t } T t=1 be the unobserved information, and we define Z t = k. If y t is produced by the kth component of the model, we have the following equations Given the information above, the complete information joint distribution is (14) and the corresponding log-likelihood function Hence, the Q function of the EM algorithm will be The probability density function p(Z t = k|y t , θ * ) can be further extended as follows, and this completes the E step of the EM algorithm.
Since α 0,k , · · · α P,k , β 1,k , · · · β Q,k K k=1 must be greater than zero, we replace them by the following, There are many ways to maximize the Q function by adjusting these parameters. One popular way is to take the derivative with respect to each parameter and set all derivatives to zero to find the optimal values of parameters that maximize the Q function. Another way to maximize the Q function is the numerical approach. In our paper, we directly apply the build-in numerical general optimization function "nls()" in R, an open source statistical computing software, to achieve this purpose. This build-in function can efficiently provide estimates for the optimal parameters in the Q function by using a Newton-type algorithm [39,40].

Implementing Finite Mixture Methodology
Three different modeling approaches, namely ARMA, ARMA-GARCH, and mixture GARCH approaches are examined. The main purpose is to compare the proposed mixture GARCH model with the ARMA and ARMA-GARCH models, and determine whether the proposed methodology performs better in modeling energy time series data. Intuitively, the forecasting result of the proposed mixture GARCH approach should be least as good as the ARMA and ARMA-GARCH models, since the two conventional models use the same theory for the mean equation, while the proposed finite mixture GARCH model makes significant refinement on variance modeling. The tasks of this research are to rigorously examine if the intuition (or hypothesis) is valid, and to make quantitative comparison among the three approaches based on various performance measures.
The mixture GARCH models are fitted by the EM algorithm mentioned in Section 2.4. This procedure is coded in R. The computing time can vary from 1 min to 10 min for a regular data sample, depending on the selection of initial values. This is sufficiently fast for most forecasting applications. On the other hand, for fitting the ARMA and ARMA-GARCH models, the commercial statistical software package SAS is adopted for ease of implementation. For model comparison, the outputs of SAS need to be exported to another code written in R so that the required performance measures can be calculated and compared among the three methods.
Overall, the process for model fitting and comparison can be summarized in Figure 1. In this paper, three types of energy time series data are collected from various sources. For each type of data, the procedure in Figure 1 is executed and results are obtained. As mentioned above, the EM algorithm and the observed data are first applied to fit the mixture GARCH models by using an in-house code written in R. The outputs from this model fitting process include the estimated parameters, fitted data and performance measures. Meanwhile, the observed data is also provided to SAS for fitting the ARMA and ARMA-GARCH models. Similar to the model fitting process in R, the outputs from SAS also contain fitted data and estimated parameters. However, some performance measures cannot be generated automatically in SAS.

Case Studies
To comprehensively test the applicability of the proposed finite mixture GARCH approach, we apply it to the predictions of hourly wind speed, wind power output, and electricity price, respectively, and compare the performances with the ARMA and ARMA-GARCH models. As such, data on the three time series variables are collected and used to fit the prediction models. The hourly wind speed data from a wind observation site in Colorado, U.S. and the hourly wind power generation data from a 900 kW NEG Micon wind turbine located in North Dakota, U.S. are obtained from our collaborators, and the hourly electricity price data are collected online New England ISO, U.S. (https://www.isone.com/isoexpress/web/reports/pricing/-/tree/lmps-rt-hourly-final, accessed on 15 October 2020). All three datasets are collected for one year, and thus each dataset has 8784 hourly entries. To tackle this issue, the fitted data from SAS, as well as the observed data, become the input for our code of performance measure computation in R. At last, the performances of the three models are compared. We adopt a variety of performance measures for comparing the proposed approach with other models. They include coefficient of determination (R 2 ), the value of log-likelihood function, Akaike information criterion (AIC), Bayesian information criterion (BIC), mean absolute error (MAE), mean absolute percentage error (MAPE), directional symmetry (DS) and weighted directional symmetry (WDS). Also,

Case Studies
To comprehensively test the applicability of the proposed finite mixture GARCH approach, we apply it to the predictions of hourly wind speed, wind power output, and electricity price, respectively, and compare the performances with the ARMA and ARMA-GARCH models. As such, data on the three time series variables are collected and used to fit the prediction models. The hourly wind speed data from a wind observation site in Colorado, U.S. and the hourly wind power generation data from a 900 kW NEG Micon wind turbine located in North Dakota, U.S. are obtained from our collaborators, and the hourly electricity price data are collected online New England ISO, U.S. (https://www. iso-ne.com/isoexpress/web/reports/pricing/-/tree/lmps-rt-hourly-final, accessed on 15 October 2020). All three datasets are collected for one year, and thus each dataset has 8784 hourly entries.
Note that the case studies mainly focus on the capability of the proposed finite mixture model on representing the dynamic development of energy time series variables. Therefore, the main consideration is on the sufficiency of model representation of the stochastic terms. For selecting the orders of GARCH models, i.e., the P and Q values, we employ the widely recommended GARCH(1,1) model according to the literature. For instance, Haas et al. [41] show that more than one lag in the conditional variance equations does not lead to significant improvement of the model. Alexander and Lazar [24] suggest that instead of leading to great improvement, the finite mixture of Gaussian models with more than two components is likely to produce biased estimation. Meanwhile, it is verified that ARMA(2,2) is suitable for modeling the mean component for all the three time series variables in this study. As a result, the models adopted for fitting and comparison are ARMA(2,2), ARMA(2,2)-GARCH(1,1), and two component mixture GARCH(1,1) models. The detailed results are presented in the following.

Wind Speed
To make a fair assessment of the proposed methodology on wind speed prediction, three one-month wind speed data samples are drawn from the entire wind speed dataset. The time series plots of the three samples are shown in Figure 2. Note that the averages are 3.60, 1.99, 2.56 m/s (or 8.05, 4.44, 5.73 mph), and the standard deviations are 2.56, 1.40, 2.09 m/s (or 5.72, 3.14, 4.68 mph) for wind speed samples 1-3, respectively. After fitting the ARMA(2,2), ARMA(2,2)-GARCH, and the 2-component mixture GARCH(1,1) model, the estimated parameters and the performance measures are summarized in Tables 1-3 for the three samples, respectively. In the tables, the first 15 rows of parameters (i.e., from rows a 1 to w 2 ) represent the estimated parameters of the three models, while the remaining rows respresent the performance of the models.
First of all, the results are analyzed for the first wind speed data sample in Table 1. It can be found that the proposed 2-component mixture GARCH can fit the wind speed data well in terms of R 2 (73.46%). Although the model can capture the majority of the total variance, the remaining 26.54% of the total variance (from the fluctuation of wind speed time series) still produces inaccuracy in the prediction. By comparing the proposed model with ARMA and ARMA-GARCH models in term of R 2 , MAE, and MAPE, DS, and WDS, it can be found that the three models actually produce similar values in terms of these metrics. This suggests that the proposed approach does not significantly improve the prediction accuracy with regard to the mean wind speed estimation. It can be attributed to the fact that all the three methods employ the same equation for the mean wind speed modeling.
it can be found that the three models actually produce similar values in terms of th metrics. This suggests that the proposed approach does not significantly improve the p diction accuracy with regard to the mean wind speed estimation. It can be attributed the fact that all the three methods employ the same equation for the mean wind spe modeling.       On the other hand, in term of "Log-likelihood", AIC, and BIC, it is clear that the mixture GARCH model outperforms the other two models. As shown in Table 1, the 2-component mixture GARCH model produces a much higher "Log-likelihood" value of −1741 and much lower AIC and BIC values of 3505 and 3560, respectively, compared with the ARMA and ARMA-GARCH models. As mentioned above, higher "Log-likelihood" values and lower AIC and BIC values suggest a better model. To verify if the finding that the proposed model is better than the other two models is statistically significant, two likelihood ratio tests, namely "LR-test (ARMA)" and "LR-test (GARCH)", are run. The former test uses ARMA(2,2) as the null model, while the later uses the ARMA(2,2)-GARCH(1,1) model as the null model. Note that the two tests are not applicable for ARMA(2,2), since the former test uses it as a constrained (null) model and the later test uses a null model that has more parameters than ARMA(2,2), which will lead to negative test statistics. Similarly, LR-test (GARCH) also is not applicable to ARMA(2,2)-GARCH(1,1) model, since it is used as the null model in the test. As indicated by the likelihood ratio test results in Table 1, the finding that the proposed model is better than the other two models in terms of volatility modeling is indeed statistically significant. Certainly, it is not difficult to observe that the GARCH model is significantly better than ARMA model in term of volatility modeling. The reason for the two observations can be traced back to in their model assumptions: the ARMA model presumes that the variance of wind speed is constant; the ARMA-GARCH model assumes autoregressive conditional variance; and the mixture GARCH model further extends the variance assumption to the mixture of multiple autoregressive conditional variances.
The results from the first wind speed data sample indicate the proposed mixture model outperforms the other two in term of volatility modeling. To further verify our findings, the second and third data samples are analyzed. It can be seen that the results  Tables 2 and 3 respectively, are indeed very similar  to the results from Table 1. Both show comparable prediction accuracies in term of the mean wind speed estimations among the three models, and the proposed mixture GARCH model is the most effective for volatility modeling. In both cases, the 2-component mixture GARCH model produces a much higher "Log-likelihood" value and much lower AIC and BIC values.

Wind Power Generation
Similarly, three one-month wind power generation samples are drawn from the wind power generation dataset. The time series plots of the three data samples are shown in Figure 3. Note that the averages are 322.08, 397.54, 303.39 kW and the standard deviations are 313.84, 275.98, 277.98 kW for wind generation samples 1-3, respectively. At first glance, the wind power generation is clearly more volatile than the wind speed, and this is further shown by the results later. For the three wind power generation samples, the parameters estimation and performance measures of ARMA(2,2), ARMA(2,2)-GARCH(1,1) and 2-component mixture GARCH(1,1) models are summarized in Tables 4-6.   glance, the wind power generation is clearly more volatile than the wind speed, and this is further shown by the results later. For the three wind power generation samples, the parameters estimation and performance measures of ARMA(2,2), ARMA(2,2)-GARCH(1,1) and 2-component mixture GARCH(1,1) models are summarized in Tables 4-6.   Based on the results from the first wind generation data sample in Table 4, it can be seen the three models have close performances for mean estimation in terms of R 2 , MAE, MAPE, DS, WDS and Ljung-Box test, and no clear winner can be declared. However, in terms of volatility modeling, the proposed mixture GARCH model is superior to the ARMA and ARMA-GARCH models. This is because it generates a higher "Log-likelihood" value (−3924), a lower AIC value (7872), and a lower BIC (7928) value. In addition, the two likelihood ratio tests, namely, LR-test (ARMA) and LR-test (GARCH), also suggest that the mixture GARCH model is better in terms of volatility modeling. However, by comparing the results of LR-test (ARMA) and LR-test (GARCH) in Table 4 with the wind speed results in Tables 1-3, it is clear that the test results in Table 4 are larger. This implies that the mixture GARCH model makes more significant improvement in wind power output modeling than in wind speed modeling. The greater improvement indirectly provides evidence for the "wilder" (greater and non-constant) volatility that lies in the wind power output data. This is because a "wilder" volatility provides more room for improvement in term of volatility modeling. Another evidence of higher fluctuation in the wind power output data can be reflected by the higher R 2 and MAPE values observed in Table 4 as  compared to those in Tables 1-3. A higher R 2 usually means a better fit in terms of total variation explained by the model. As suggested by the formulas for R 2 and MAPE in Appendix A, higher values of R 2 could result from larger values of ∑ i (y i − y) 2 , or smaller values of ∑ i (y i −ŷ i ) 2 . However, a higher MAPE usually indicates a worse fit (i.e., a larger n ∑ i=1 |ŷ i − y i | value). Thus, it is reasonable to assert that the very large values of ∑ i (y i − y) 2 (volatility/variance/fluctuation) are the main contributing factor for the large values of both R 2 and MAPE.
The findings obtained from the first wind generation data sample can also be verified by the results of the other two data samples in Tables 5 and 6. The three models show similar performance for mean estimation in terms of R 2 , MAE, MAPE, DS, WDS and Ljung-Box test, while the proposed approach shows the upperhand for volatility modeling. It is consistently supported by the much larger "Log-likelihood" value and much lower AIC value and BIC value from the proposed approach compared with the other two models. It is worthwhile to mention that for the third wind generation data sample, the ARMA(2,2)-GARCH(1,1) model does not perform better than the ARMA model even in terms of volatility modeling, as suggested by the LR test(GARCH), AIC and BIC values. However, the mixture GARCH model can still outperform the other two models. This confirms that the mixture GARCH model is indeed a more powerful tool as compared to the conventional GARCH model, because it even suits the situation where the GARCH model may not work well.  Tables 7-9 for the three electricity price data samples, respectively. By analyzing the three tables, the findings from wind speed and wind power generation are once again verified. In other words, the three models have close performance for mean estimation as reflected by R 2 , MAE, MAPE, DS, WDS and Ljung-Box tests. Although the proposed approach shows slightly better results in some metrics compared with the other two models, it also has slightly errors in other metrics. On the other hand, the mixture GARCH model consistently outperforms the other two models in term of volatility modeling, as suggested by the "Log-likelihood", AIC and BIC values, as well as the likelihood ratio tests. Table 7 shows that for the first electricity price sample, the "Log-likelihood" value of the proposed approach is −3033, larger than those of ARMA and ARMA-GARCH models. The AIC and BIC values are 6090 and 6145, respectively, smaller than the counterparts of ARMA and ARMA-GARCH models. Tables 8 and 9 further confirm the same trend for volatility modeling of electricity price.

Electricity Price
Another interesting finding is that the MAPE values for the electricity price samples are significantly smaller, as compared with the results from the wind speed and wind power output samples. Also, Figure 4 shows that the three electricity price time series samples are much more "stable" or less fluctuating than the wind speed and wind power output. As such, the metric of MAPE heavily depends on the characteristic (volatility) of data.

Limitations
The proposed methodlogy, in its current form, has certain limitations. Addressing the limitations would become important research tasks in the future. First, the proposed approach adopts the relatively simple MN-GARCH model, while many other forms of GARCH models (e.g., NGARCH, EGARCH, and GARCH-M models) should also be considered. It would be intriguing to compare the performances of those GARCH models with the MN-GARCH model in the proposed approach. Second, the methodology is limited to univariate processes. Many applications might require the capability of describing and predicting multivariate processes. For instance, wind energy prediction might be improved with both wind speed and direction information [42]. As such, it is belived that the extension of the proposed methodology to multivariate processes is of great importance. Third, the nonlinear relatiosnhip between variances might be captured with higher accuracy by introducing machine learning models. Although GARCH models have been successful in volitility modeling and enjoy elegant and simple mathematical construction, machine learning models such as support vector machines (SVM) and covolutional neural network (CNN) have started to play an important role in this field. The combination of the two approaches has been reported to be promising. As such, in the futture, the investigation on the combined machine learning-GARCH approach should be conducted and compared with the approach proposed in this work.

Conclusions
For modeling and predicting the challenging renewable energy and energy related time series subjects, we propose a two-component finite mixture GARCH approach that adopts ARMA model and combines it with MN-GARCH model for volatility modeling.
Meanwhile, we apply the EM algorithm to efficiently estimate the model parameters. To verify the effectiveness of the proposed approach, we not only apply this approach to three different energy related time series, namely, wind speed, wind power generation, and electricity price, but also test three random data samples for each time series to ensure results consistency.
For all the three time series, the conventional ARMA and ARMA-GARCH models are also adopted to compare with the proposed mixture GARCH model. In this case, ARMA(2,2) and ARMA(2,2)-GARCH(1,1) models are found suitable. The results generate two general findings, which are consistent among all data samples of any particular time series variable, as well as among different time series variables. First, there is little evidence that the proposed 2-component mixture GARCH approach can always outperform the ARMA and ARMA-GARCH models in term of estimations of mean wind speed, mean wind power output, or mean electricity price. Second, the proposed approach does outperform the ARMA and ARMA-GARCH models in terms of volatility modeling, and the superior performance is proven to be statistically significant. The practical significance of the findings is clear: the proposed approach provides a novel and robust tool for energy related predictions which can effectively mitigate the prediction risks thanks to reduced errors in volatility estimation.

Conflicts of Interest:
The authors declare no conflict of interest.

Coefficient of determination (R 2 )
R 2 is commonly used to measure the goodness of fit of a model by computing the fraction of the total variance. The formula for calculating R 2 is shown as follows, where y i are the observed values,ŷ i is the estimated value, and y is the sample mean.

Akaike information criterion (AIC)
AIC is an indicator on how well a statistical model fits the data. The formula for calculating AIC is as follows, AIC = 2k-2ln(L), where k is the number of model parameters, and ln(L) is the maximized log likelihood for the model.

Bayesian information criterion (BIC)
Being closely related to AIC, BIC is another tool of model selection based on likelihood function. The model with the lowest BIC is selected. BIC is calculated as follows, BIC = kln(n)-2ln(L), where k is the number of model parameters, n is the number of data points, and ln(L) is the maximized log likelihood for the model.

Mean absolute error (MAE)
MAE is a measure for the average magnitude of prediction errors, and a lower MAE value is preferred. The formula for calculating MAE is as follows, where y i are the observed values,ŷ i is the estimated value, and n is the sample size.

Mean absolute percentage error (MAPE)
MAPE is an indicator of the overall relative prediction accuracy for a model. It usually adopts the following form, where y i are the observed values,ŷ i is the estimated value, and n is the sample size.

Directional symmetry (DS)
DS is roughly defined as "things going the same direction" and can be calculated as, where y i are the observed values,ŷ i is the estimated value, and n is the sample size.

Weighted directional symmetry (WDS)
WDS measures the magnitude of forecasting error and the direction. It places more penalty on the targets with incorrectly predicted directions than those with correctly predicted directions. WDS is defined as, where y i are the observed values,ŷ i is the estimated value, and n is the sample size.

Log-likelihood rate test
The likelihood ratio test can be used to compare the fits of two competing models. The test statistic D is calculated as follows, D = −2 ln("likeihood f or null model" /"likelihood f or alternative model") where the statistic D is assumed to follow the chi-squared distribution with degrees of freedom df 1 -df 2 , which measures the difference in the number of parameters between the two models.

Ljung-Box test
The Ljung-Box test measures the randomness and independence of observed data. It can be defined as follows, Q = n(n + 2) where n is the sample size,ρ 2 k is the sample autocorrelation at lag k, and h is the number of lags being tested [15].