Forecasting the Carbon Price Using Extreme-Point Symmetric Mode Decomposition and Extreme Learning Machine Optimized by the Grey Wolf Optimizer Algorithm

: Due to the nonlinear and non-stationary characteristics of the carbon price, it is difﬁcult to predict the carbon price accurately. This paper proposes a new novel hybrid model for carbon price prediction. The proposed model consists of an extreme-point symmetric mode decomposition, an extreme learning machine, and a grey wolf optimizer algorithm. Firstly, the extreme-point symmetric mode decomposition is employed to decompose the carbon price into several intrinsic mode functions and one residue. Then, the partial autocorrelation function is utilized to determine the input variables of the intrinsic mode functions, and the residue of the extreme learning machine. In the end, the grey wolf optimizer algorithm is applied to optimize the extreme learning machine, to forecast the carbon price. To illustrate the superiority of the proposed model, the Hubei, Beijing, Shanghai, and Guangdong carbon price series are selected for the predictions. The empirical results conﬁrm that the proposed model is superior to the other benchmark methods. Consequently, the proposed model can be employed as an effective method for carbon price series analysis


Introduction
With the growing concern about climate change, many countries and regions have begun to adopt the carbon emission trading mechanism to effectively control greenhouse gas emissions [1].The carbon trading market, playing a significant role in dealing with global climate change, is one of the most effective tools [2,3].As one of the largest carbon emitters [4], China officially proposed to implement an initial carbon emissions trading system in 2010, which was response to the challenge of climate change.Subsequently, seven pilot provinces and cities were set up, including Beijing, Shanghai, Guangdong, Shenzhen, Tianjin, Chongqing, and Hubei in 2011.Afterwards, carbon emissions trading platforms were launched during 2013 and 2014.Furthermore, on December 19, 2017, the national emissions trading scheme (ETS) was officially implemented [5], which almost doubled the coverage of the carbon pricing system [6].To date, the Chinese government has established eight pilot carbon markets that have valid developments for reducing greenhouse gas emissions [7].In addition, by the end of December 2018, the cumulative trading volume pilot exceeded 270 million tons, and the total turnover exceeded 6 billion yuan.Obviously, China is the world's largest energy consumer and greenhouse gas emitter [8].The ETS has become an extremely significant financial market, and accordingly, it will play an increasingly important role in reducing emissions in the future accordingly.
Currently, the carbon price covers about 20% of greenhouse gas emissions [9], and calls for a global carbon price are intensifying [10,11].Economists believe that carbon pricing is one of the most effective ways to reduce global warming emissions [12].Carbon pricing is greatly relevant today, and it will be even more so in the future.Hence, it is necessary to predict the carbon price, to understand the evolutionary process of the carbon price.On the one hand, for policy makers, forecasting carbon price accurately is conducive to grasping the price fluctuations of carbon market, which can formulate carbon emission trading market policies and develop a reasonable carbon price mechanism.Specifically, to stabilize the price, and to promote the rational development of the market, the policy makers could adjust the price range of the price and the quantity of the market stability reserve, according to the law of carbon price changes [13].On the other hand, for investors, predicting carbon price accurately can provide scientific decision-making tools and guide investors towards to make reasonable investments.Consequently, it makes sense to integrate carbon price predictions.
According to existing literature, researchers have paid great attention to carbon price analysis: research for on-the-spot and futures prices of EU ETS [14][15][16][17][18], studies on the relationship between the carbon price and the energy price [19][20][21][22][23], and research into the market efficiency based on the carbon price [24][25][26][27].Although some achievements have been made with existing carbon price analysis, the literature on carbon price prediction is limited [28].Currently, there are two main representative theories in carbon price forecasting: factor forecasting and time series forecasting.Factor forecasting, mainly based on different factors, has a good performance.The coal, oil, natural gas, other energy prices, and global equity indices are exogenous variables, which are applied to forecast carbon price [29,30].However, influencing factors are exogenous variables, which results in errors, and the failure of carbon price prediction.Time-series forecasting, which includes econometric and statistical prediction models, artificial intelligence models and ensemble models, is mainly based on the historical data, using price-only to predict.Econometric and statistical prediction models, as traditional classical prediction methods, are widely used in the traditional carbon market for carbon price forecasting and fluctuation analysis.These models mainly include the generalized autoregressive conditional heteroscedasticity model (GARCH), the autoregressive conditional heteroscedasticity model (ARCH), dynamic model averaging (DMA), etc. Byun and Cho [31] put forward GARCH-type models to forecast European Climate Exchange (ECX) carbon futures from December 2008 to December 2011, and concluded that the GJR-GARCH [32] model performs better than the TGARCH and GARCH models.The AR-GARCH model was adopted by Chevallier to predict the second-stage carbon price fluctuations of the EU ETS, which illustrated that the proposed model had a high prediction accuracy [33].Furthermore, the nonparametric model was conducted to forecast the Blue Next spot and ECX futures prices from April 2005 to April 2010.The result illustrated that the error of nonparametric modeling is lower than the linear autoregressive models [34].Moreover, Conrad et al. [35] regarded fractionally integrated asymmetric power GARCH (FIAPGARCH) as a predicted model, to predict the first-and second-phase carbon prices of EU ETS, and determined that the FIAPGARCH model can describe the fluctuation of carbon price well.María et al. [36] executed the autoregressive moving average including exogenous covariates and generalized autoregressive conditional heteroskedasticity model (ARMAX-GARCH) framework with a time-varying jump probability, to forecast the price of Phase 2 EU price, and concluded that the ARMAX-GARCH framework with a time-varying jump probability is more accurate than the ARMAX-GARCH model.Also, dynamic model averaging (DMA) was conducted by Koop et al. to forecast the carbon markets, which confirmed that DMA can perform forecasting accurately [37].Although the econometric and statistical prediction models can obtain higher precisions in carbon market price predictions, they are based on the fact that the carbon market price changes are highly nonlinear and non-stationary.In addition, it is difficult to effectively deal with nonlinear modes that are hidden in the prices of non-stationary carbon markets, so that they generally cannot obtain satisfactory prediction results.
To overcome the limitations of the econometric and statistical prediction models, the artificial intelligence algorithms demonstrating excellent nonlinear modeling capabilities, as represented by artificial neural network (ANN) and least-squares support regression (LSSVR), were selected to predict the carbon price.Tsai and Kuo [38] constructed an Ant-Based Radial Basis Function Network (ARBFN) to predict carbon price based on the Radial Basis Function Network (RBFN), and Ant Colony Optimization (ACO), which provided an accurate and real-time method for the forecasting of carbon price.Zhu and Wei [39], presented several hybrid models of autoregressive integrated moving average model (ARIMA) and least square support vector machine (LSSVM) to predict the carbon futures prices of EU ETS from April 2005 to March 2011, and concluded that the model ARIMALSSVM2 exceeded the single ARIMA, ANN, and LSSVM models.Also, a novel hybrid neuro-fuzzy controller model, based on an ANN system and an adaptive neuro-fuzzy inference system (ANFIS), was provided by Atsalakis to predict the carbon futures price, which was superior to other models such as ARIMA, ANN, ARIMALSSVM2, etc., in terms of forecasting accuracy [40].Fan et al. [41] examined the multilayer perceptron (MLP) neural network for forecasting carbon price.The results indicated that the model showed good performance in forecasting the carbon price.Zhang et al. [28] employed the grey neural network (GNN), optimized by the ant colony algorithm (ACA) model to forecast the carbon spot price of EU ETS.The results indicated that the model performed better than signal ARIMA, ANN, and LSSVM.
Based on the aforementioned discussion, it is manifest that some models were optimized by using other algorithms such as the ACO, ACA, and ANN systems.Nevertheless, these methods have high requirements for the accuracy and smoothness of historical data, which have defects for non-stationary data with time-varying properties.In addition, the artificial intelligence models neglect the information of different features in the original carbon price time series, which affects the accuracy of prediction.Considering that the original carbon price time series characteristics are unstable and random, the signal decomposing technique was introduced into carbon price forecasting.
Currently, to achieve the goal of eliminating the carbon price stochastic volatility, researchers generally examine the empirical mode decomposition (EMD) or ensemble empirical mode decomposition (EEMD) to analyze the time series.Several studies have proven the efficiencies of these algorithms.Zhu et al. [42] came up with a multiscale ensemble forecasting model that integrated empirical mode decomposition (EMD), genetic algorithm (GA), and ANN, to predict the carbon price, which demonstrated that the multiscale ensemble forecasting model outperformed the ANN and GAANN models.Sun et al. [43] proposed a combined forecasting model based on variational mode decomposition (VMD) and spiking neural networks (SNNs), and confirmed that the models have high accuracies and reliabilities.Taking four futures prices of EU ETS as sample, Zhu et al. [44] successfully introduced an EMD-based evolutionary least squares support vector regression model, and concluded that the model was suitable for nonlinear and non-stationary carbon price forecasting.In addition, the EMD model, and the LSSVM model with a kernel function prototype and particle swarm optimization (PSO), were integrated to forecast the daily carbon future prices for the EU ETS during December 2015 and 2016.The results confirmed that the proposed model had high levels of accuracy [3].Furthermore, Feng et al. [45] applied the EEMD to analyze the carbon price of the EU ETS, to improve forecasting accuracy and reliability.
However, despite of the good performance of multiscale ensemble forecasting models, based on EMD/VMD/EEMD, there are drawbacks to ameliorate.Consequently, the ensemble empirical mode decomposition (ESMD) is a further improvement of the EMD, which overcomes the outcomes of EMD.First, the ESMD method solves the problem of mode mixing for the EMD method, which can enhance the effectiveness of decomposition.In addition, ESMD uses the direct interpolation (DI) method, which can accurately reflect the fluctuation characteristics and the trend changes of the time series.Moreover, ESMD can adaptively time-frequency decomposition, based on time-series local time-varying features, which are highly suitable for analyzing non-stationary and nonlinear time series.Furthermore, it is crucial to note that the extreme learning machine (ELM) has been widely executed in various forecasting fields successfully, such as in short-term load forecasting [46], the sub-component load forecasting [47], short term electricity price forecasting [48], and wind speed forecasting [49].However, for the ELM model, the input weight matrix and the hidden layer deviation are randomly determined, so that the nodes of some hidden layers fail or cannot meet the data requirements.Thus, it becomes crucial to select the optimal number of hidden-layer nodes adaptively.Therefore, Abdoos [50] applied the Gram-Schmidt orthogonalization (GSO), to optimize the ELM in order to forecast the wind power of two wind farms.Rocha et al. [51] proposed the PSO to optimize the ELM to predict the distributed electrical generation system capacity.Sun et al. [52] focused on the PSO-ELM to predict carbon price, and the results indicated that PSO-ELM performed better.Based on these, this paper employs the grey wolf optimizer algorithm (GWO) to optimize the ELM.Also, this paper could prove the applicability of this method in carbon price prediction.
From the above, this paper puts forward a hybrid forecasting approach with ESMD, a partial autocorrelation function (PACF), ELM, and GWO.Also, the carbon price series of China, such as Hubei, Beijing, Shanghai, and Guangdong are selected as the samples.Firstly, considering the characteristics of the carbon price series, the ESMD is employed to decompose the carbon price time series into a finite and small number of intrinsic mode functions (IMFs) and one residue.Decomposing the original time series by ESMD can effectively determine the characteristic information of the original data.Secondly, the PACF is applied to determine the input variables of each subsequence.Also the ELM optimized by the GWO is utilized to forecast the decomposed components.Furthermore, in order to verify the effectiveness and the practicability of the proposed method, this paper compares the forecasting results of the proposed method with other benchmark methods, and demonstrates that the proposed method can achieve a high prediction accuracy.Hence, it is effective to predict the carbon price with nonlinearity and non-stationarity.
The contributions of this paper may be concluded as follows: • The extreme-point symmetric mode decomposition is applied, to decompose the carbon price to promote the prediction accuracy.

•
The extreme learning machine optimized by the grey wolf algorithm could obtain a good performance with predictions.

•
The proposed model ESMD-GWO-ELM significantly improves the forecasting accuracy of the carbon price, with only the historical carbon price sequence being taken into account.
The remainder of this paper is structured as follows: Section 2 describes the extreme-point symmetric mode decomposition, the extreme learning machine, the grey wolf optimizer algorithm, and the partial autocorrelation function.Section 3 introduces the proposed model: ESMD-GWO-ELM.The empirical analysis and forecasting results are discussed in Section 4. Finally, conclusions are drawn in Section 5.

Extreme-Point Symmetric Mode Decomposition
Extreme-point symmetric mode decomposition (ESMD), a new method for the empirical mode decomposition method, features an improved and optimized version of EMD [53].The ESMD method solves the problems of screening termination and modal aliasing in the EMD method.Moreover, the ESMD method draws on the idea of EMD, and changes the external envelope interpolation into internal pole-symmetric interpolation.This borrows from the idea of the least squares, to optimize the last residual modality to make it become the global adaptive mean of the entire data, and to determine the optimal number of filters.In addition, considering that all integral transforms, including the Hilbert transform, have inherent defects in analyzing time-frequency changes, ESMD abandons the traditional concept of spectrum analysis relying on integral transform, and creatively proposes a direct interpolation (DI) method, which intuitively reflects the time-varying amplitude and frequency of each mode.
The ESMD method consists of two parts: the first part is the mode decomposition, which can generate several intrinsic mode functions (IMFs) and one optimal adaptive global averaging.The second part is the time-frequency analysis, using the direct interpolation method to calculate the instantaneous frequency of the natural mode, which analyzes frequency changes on each time scale, and determines when mutations will occur at each time scale.The specific steps of ESMD are shown as follows: Step 1: Find all of the extreme points (maximum and minimum) of the whole data series X, and record as Step 2: Connect all of the adjacent poles with line segments, and record the midpoints of the line as Step 3: The boundary points F 0 and F n are added to the left and right ends, respectively, using certain methods.
Step 4: Construct p interpolation curves L 1 , • • • , L p (p ≥ 1) by using the obtained n + 1 midpoints, and calculate the mean curve Step 5: Repeat the above steps for the series X − L * until the number of screenings reaches the preset maximum value K, and the first decomposed empirical mode is obtained and named as M 1 .
Step 6: Repeat the above five steps for the remaining sequence X − M 1 until the remaining sequence R has only a certain number of poles left, and the decomposed empirical modes M 2 , M 3 • • • can be obtained, respectively.Step 7: Change the number of screening times K within the limited interval [K min , K max ], repeat the above six steps, and then calculate the variance ratio σ/σ 0 and plot the variance ratio graph with K.
Step 8: Select the number of transformations K 0 as the optimal number of screenings when the variance ratio σ/σ 0 is the smallest, and repeat the previous six steps until all of the decomposed empirical modes are outputs.
After decomposition, the original time series X can be expressed as X = ∑ M i + R; that is, the time series X is decomposed into a series of empirical modes and one residual variable.

Extreme Learning Machine
The extreme learning machine (ELM), proposed by Huang et al, is a new type of single hidden layer feedforward neural network (SLFN) learning algorithm [54].The ELM algorithm is comprised of an input layer, a hidden layer, and an output layer, which randomly generates the connection weights of the input layer, the hidden layer, and the threshold of the hidden layer neurons.Moreover, there is no need to adjust during the training process, and a unique optimal solution can be obtained once the number of neurons is set in the hidden layer requirements.Compared with the traditional neural network method, the algorithm has the advantages of a fast learning speed, less human intervention, and a strong generalization ability.
The output of an ELM is as follows: where is the output vector for the hidden layer relative to the input x, β i is the output weight of the ith hidden layer node and the output neuron; a i is the input weight for the input neurons and ith hidden layer node; b i is the threshold of the ith hidden layer node.
Then, the output weight can be obtained by solving the least-squares solution of the linear Equation (2).
The least squares solution of Equation ( 2) is: where H + is the Moore-Penrose-generalized inverse of the hidden layer output matrix H [55], which is used to calculate the matrices of the hidden layer.ELM aims to minimize the training errors and the norm of the output weights.

Grey Wolf optimizer Algorithm
The grey wolf algorithm (GWO) is a new group of intelligence algorithms recently proposed by Mirjalili et al. [56], which simulate the hierarchy and hunting behavior of grey wolves in nature, and finds the optimal value.The GWO algorithm exhibits more advantages in terms of fast convergence and less adjustment parameters in solving function optimization problems.
The social ranks of the grey wolves are divided into four parts: alpha (α), beta (β), delta (δ), and omega (ω) as in Figure 1.The hunting behaviors of the wolves mainly involve tracking and approaching the prey, tracking the prey, and attacking the prey.

min ‖𝛽 • ℎ(𝑥 ) − 𝑦 ‖
(2) The least squares solution of Equation ( 2) is: where  is the Moore-Penrose-generalized inverse of the hidden layer output matrix  [55], which is used to calculate the matrices of the hidden layer.ELM aims to minimize the training errors and the norm of the output weights.

Grey Wolf optimizer Algorithm
The grey wolf algorithm (GWO) is a new group of intelligence algorithms recently proposed by Mirjalili et al. [56], which simulate the hierarchy and hunting behavior of grey wolves in nature, and finds the optimal value.The GWO algorithm exhibits more advantages in terms of fast convergence and less adjustment parameters in solving function optimization problems.
The social ranks of the grey wolves are divided into four parts: alpha (α), beta (β), delta (δ), and omega (ω) as in Figure 1.The hunting behaviors of the wolves mainly involve tracking and approaching the prey, tracking the prey, and attacking the prey.α stands for the head wolf, which leads the grey wolf group, followed by β, which assists the head wolf in making decisions; δ are the ordinary wolves commanded by α and β, and the underlying wolf is ω is commanded by α, β, and δ.Under the leadership of α, the group of grey wolves captures the prey, and the wolves gradually approach and track the prey by scent and other information.The wolves surround the prey once the location of the prey is determined.
During surrounding of the prey, the distance between the grey wolf and the prey are shown in Equations ( 4) and ( 5): where  represents the iterations,  ⃗ ( ) stands for the prey position after the -th iteration, and  ⃗ () indicates the position of a wolf. ⃗ refers to the distance between the grey wolf and the prey, and  ⃗ and  ⃗ are the convergence factor and the swing factor, respectively.The calculation formula is shown in Equations ( 6) and (7): α stands for the head wolf, which leads the grey wolf group, followed by β, which assists the head wolf in making decisions; δ are the ordinary wolves commanded by α and β, and the underlying wolf is ω is commanded by α, β, and δ.Under the leadership of α, the group of grey wolves captures the prey, and the wolves gradually approach and track the prey by scent and other information.The wolves surround the prey once the location of the prey is determined.
During surrounding of the prey, the distance between the grey wolf and the prey are shown in Equations ( 4) and ( 5): where t represents the iterations, → X P(t) stands for the prey position after the t-th iteration, and indicates the position of a wolf.C are the convergence factor and the swing factor, respectively.The calculation formula is shown in Equations ( 6) and ( 7): where → r 1 and → r 2 are random variables between [0, 1]; with the number of iterations increases, the convergence factor → a decreases linearly from 2 to 0. The optimization process of the GWO algorithm is to locate the prey position based on the positions of α, β, and δ.The ω wolves pursue the prey under the guidance of the α, β, and δ wolves, and they re-determine the prey position according to the best positions of the α, β and δ wolves.The positions of the grey wolf group are in accordance with Equation (8) to Equation ( 14): where

Algorithm 1: The GWO algorithm
Step 1: randomly initialize the population to be within the specified range; Step 2: calculate the fitness of each individual; Step 3: select, in order of fitness α, β, and δ; Step 4: update the other wolves using formulas (4) to (14) Step 5: update the parameters → α, → A, and →

C;
Step 6: if the end condition is not met, go to step 2; Step 7: output the location of the α wolf.

The Partial Autocorrelation Function
The partial autocorrelation function (PACF), as a common statistical tool, gives the relation that exists between the time series and their lags, which are used to determine the input variables of the neural network [57].
Give a time series x t , with Φ kj representing the j regression coefficient with the k-order autoregressive equation, then the k-order autoregressive model is expressed as: where Φ kk is the last coefficient and the partial autocorrelation function.

The Proposed Model
Figure 2 illustrates the process of the extreme learning machine optimized by the grey wolf optimizer algorithm (GWO-ELM) model.The left part is the GWO algorithm, and the right part is the prediction process of the ELM algorithm.It is apparent that the GWO algorithm is employed to optimize the parameters in ELM, which benefit in obtaining the optimal network.Figure 3 is the forecasting framework for carbon price, with the ESMD-GWO-ELM model.As can be seen, the original carbon price series is decomposed into several IMFs, and one residue by EMD, EEMD and ESMD at first.Then, the partial autocorrelation function (PACF) is utilized to determine the input variables of each subsequence, so that the series are divided into the training set and test set.Thirdly, the GWO-ELM is employed to predict each series.Finally, the prediction results are compared with the real carbon price.

Data
At present, China has established eight regional carbon trading markets, including Beijing, Shanghai, Guangdong, Shenzhen, Hubei, Fujian, Chongqing, and Tianjin.Among them, Hubei is the world's third-largest emissions trading system [58].In addition, Hubei's carbon emission trading market has the highest degree of liquidity, and the volume is first-ranked [59].Furthermore, the Hubei pilot emission trading market has been a pioneer among all of pilots, as it owns sufficient transaction data [60].Hence, Hubei was selected as a sample to forecast the carbon price.In addition, considering the rich volume, the value of carbon trading, and the long transaction time, Beijing, Shanghai, and Guangdong were also selected as examples in this paper [60,61]

Data
At present, China has established eight regional carbon trading markets, including Beijing, Shanghai, Guangdong, Shenzhen, Hubei, Fujian, Chongqing, and Tianjin.Among them, Hubei is the world's third-largest emissions trading system [58].In addition, Hubei's carbon emission trading market has the highest degree of liquidity, and the volume is first-ranked [59].Furthermore, the Hubei pilot emission trading market has been a pioneer among all of pilots, as it owns sufficient transaction data [60].Hence, Hubei was selected as a sample to forecast the carbon price.In addition, considering the rich volume, the value of carbon trading, and the long transaction time, Beijing, Shanghai, and Guangdong were also selected as examples in this paper [60,61]   The data series were divided into two parts: the training sample and the test sample.There are several methods of segmenting the data to train and test the whole data.Generally, most studies classify data for training and assessment based on convenience.In this paper, data is split using a ratio equal to 70:30.Consequently, taking Hubei as an example, my 860 data points are selected as training sets, and the remaining 368 data points are applied, to evaluate the established models.The samples of carbon price are shown in Table 1.The data statistical descriptions of the carbon price series are stated in Table 2. From this, the carbon price of Beijing had the highest average price, which reached 51.33 RMB.The average prices of Shanghai and Guangdong were lower, at 27.91 and 25.57RMB, respectively.Also, the average price of Hubei was the lowest, at 20.31 RMB.These four carbon price series were not subject to strict normal distribution.Compared with the normal distribution, the skewnesses of all the four carbon price series are bigger than 0, indicating that the series exhibits positive skewness, which implies that the carbon price series are flatter to the right.The kurtoses of Hubei and Shanghai were both lower than 3, which indicates that these two carbon price series have thin tails, compared to a normal distribution, while the other two cities have fat tails.The Jarque-Bera test, rejecting the null hypothesis that obeys the standard normal distribution at any significant level, further confirms these results, given that this paper tests the non-stationary and nonlinear aspects of the carbon price  The data statistical descriptions of the carbon price series are stated in Table 2. From this, the carbon price of Beijing had the highest average price, which reached 51.33 RMB.The average prices of Shanghai and Guangdong were lower, at 27.91 and 25.57RMB, respectively.Also, the average price of Hubei was the lowest, at 20.31 RMB.These four carbon price series were not subject to strict normal distribution.Compared with the normal distribution, the skewnesses of all the four carbon price series are bigger than 0, indicating that the series exhibits positive skewness, which implies that the carbon price series are flatter to the right.The kurtoses of Hubei and Shanghai were both lower than 3, which indicates that these two carbon price series have thin tails, compared to a normal distribution, while the other two cities have fat tails.The Jarque-Bera test, rejecting the null hypothesis that obeys the standard normal distribution at any significant level, further confirms these results, given that this paper tests the non-stationary and nonlinear aspects of the carbon price series.The ADF (Augmented Dicky-Fuller) test is an effective method to test the stability of time series, and the BDS (Brock-Decher-Scheikman) test can effectively detect the nonlinear characteristics of the time series.The size of the embedding dimension m of the phase space reconstruction in the BDS test generally m ∈ [2, 5].This paper selects Eviews 8.0 software (Quantitative Micro Software, Irvine, CA, USA) to check the stability and nonlinearity of the carbon price.The test results are shown in Tables 3  and 4. Apparently, all of the four carbon price series are non-stationary and nonlinear.Taking Hubei carbon price as an example, the ADF test implies that the p-value of Hubei's carbon market price is greater than 10%, which means that the carbon price is non-stationary at the 10% significance level.The BDS test displays that the carbon price has a p-value of lower than 0.01, indicating that Hubei's carbon market price is non-linear at the 1% significance level.

Data Processing
From above, the results illustrated that the original carbon price fluctuation is severe.In order to reduce the interference of noise, the ESMD was utilized to decompose the carbon price into IMFs and residue.To reduce repetition, this paper only gives a detailed description of the carbon price forecast of the Hubei carbon market.The decomposition results after ESMD are illustrated in Figure 5, where the original carbon price series are decomposed into seven IMFs and one residue.These decomposed IMFs exhibited simpler structures, more stable fluctuations, and greater regularity, all of which led to better fitting and forecasting accuracies, as compared with an original carbon price.
Moreover, to demonstrate the excellence of ESMD, EMD and EEMD were employed to decompose the original carbon price series as well.Figures 6 and 7 illustrate the results, respectively.Both EMD and EEMD decomposed the carbon prices into eight IMFs and one residue.

Determination of the Input Variables by PACF
To overcome the limitations of ignoring the relationship between the inputs and outputs of the model, it is vital to know the relations between the series and their lags.The statistical tool, namely the partial autocorrelation function (PACF), is effective for describing the characteristics of the series [63,64], which are applied toward determining the input variables.
The results of the Hubei carbon prices, decomposed by ESMD, EMD, and EEMD are illustrated in Figures 8-10 respectively.Specifically, assuming that x t is the output variable, x t−k is one of the input variables if the partial autocorrelation at lag k exceeds the approximate 95% confidence interval.

Determination of the Input Variables by PACF
To overcome the limitations of ignoring the relationship between the inputs and outputs of the model, it is vital to know the relations between the series and their lags.The statistical tool, namely the partial autocorrelation function (PACF), is effective for describing the characteristics of the series [63,64], which are applied toward determining the input variables.
The results of the Hubei carbon prices, decomposed by ESMD, EMD, and EEMD are illustrated in Figures 8-10 respectively.Specifically, assuming that  is the output variable, x is one of the input variables if the partial autocorrelation at lag k exceeds the approximate 95% confidence interval.Table 5 presents the PACF results of the Hubei carbon price after ESMD, EMD, and EEMD.

Determination of the Input Variables by PACF
To overcome the limitations of ignoring the relationship between the inputs and outputs of the model, it is vital to know the relations between the series and their lags.The statistical tool, namely the partial autocorrelation function (PACF), is effective for describing the characteristics of the series [63,64], which are applied toward determining the input variables.
The results of the Hubei carbon prices, decomposed by ESMD, EMD, and EEMD are illustrated in Figures 8-10 respectively.Specifically, assuming that  is the output variable, x is one of the input variables if the partial autocorrelation at lag k exceeds the approximate 95% confidence interval.Table 5 presents the PACF results of the Hubei carbon price after ESMD, EMD, and EEMD.

Accuracy Assessment
In order to evaluate the multi-frequency combined prediction performance proposed in this paper, the root mean square error (RMSE), the mean absolute percentage error (MAPE), and the coefficient of determination (R 2 ) were applied, to measure the prediction accuracies of the proposed models.Among them, apart from the R 2 , the lower the value was, the better the forecasting accuracy of the model was.The R 2 value ranged from 0, 1 the closer to 1, the better the model.
The calculation formula for these indicators is shown in Table 6. is the real value at time ;  is the prediction for the same period , and  is the number of the training sample.

ESMD EMD EEMD
IMFs and R Lag IMFs and R Lag IMFs and R Lag

Accuracy Assessment
In order to evaluate the multi-frequency combined prediction performance proposed in this paper, the root mean square error (RMSE), the mean absolute percentage error (MAPE), and the coefficient of determination (R 2 ) were applied, to measure the prediction accuracies of the proposed models.Among them, apart from the R 2 , the lower the value was, the better the forecasting accuracy of the model was.The R 2 value ranged from [0, 1] the closer to 1, the better the model.
The calculation formula for these indicators is shown in Table 6.y t is the real value at time t; ŷt is the prediction for the same period t, and n is the number of the training sample.

Name
The Calculation Formula

Forecasting Results and Discussion
In this study, the ESMD-GWO-ELM model was employed to forecast the carbon price.In order to prove the performance forecasting of this model, the benchmark models, including Original-ELM, EMD-ELM, EEMD-ELM, ESMD-ELM, Original-GWO-ELM, EMD-GWO-ELM, and EEMD-GWO-ELM were also provided for predicting the carbon price.All of the models were established in Matlab software (2018, MathWorks, Natick, MA, USA).
The forecasting results are shown in Figure 11; the comparisons of between RMSE, MAE, MAPE, and R 2 between the eight models are shown in Table 7 and Figures 12-14.From this, this paper can draw the following conclusions.
(a) The proposed model ESMD-GWO-ELM has the lowest RMSE (0.1438) and MAPE (0.31), and highest R 2 (0.9918), which illustrates that the model performs significantly better than all of the considered benchmark models in the carbon price forecasting of Hubei.(b) Compared with the eight models, the Original-ELM model is the worst-performing, as it has the biggest RMSE (6.0734), MAPE (38.42), and lowest R 2 (0.6304).This is primarily due to the fact that the original carbon price is unstable and nonlinear, so that the single model is not fit for direct forecasting without decomposition processing.(c) In the Original-ELM, EMD-ELM, EEMD-ELM, ESMD-ELM prediction models, the decomposed prediction models are obviously superior to the direct-prediction models.The main reason is that the structure and fluctuation of the decomposed IMF sequence become simpler and more stable, which enhances the forecast precision.(d) Compare with individual models (Original-ELM, EMD-ELM, EEMD-ELM, and ESMD-ELM) with optimized models (Original-GWO-ELM, EMD-GWO-ELM, EEMD-GWO-ELM, and ESMD-GWO-ELM), it is apparent that the GWO-ELM is significantly superior to the ELM model.This result demonstrates that the optimizing ELM parameter is necessary and meaningful.(e) It is evident that the ESMD decomposition method performs better than the EMD and EEMD methods, whatever the ESMD-ELM or ESMD-GWO-ELM, when compared to the decomposed models (EMD, EEMD, and ESMD).The result proves the superiority of the ESMD decomposition model.(f) However, compared with the EMD and EEMD methods, it is hard to decide which one is better.
As shown in Table 7, the RMSE (1.7352) and MAPE (3.51) values of EMD-ELM are lower than those of EEMD-ELM (2.3936 and 9.92 respectively), which leads to the conclusion that the EMD-ELM performs better.However, when we compare the EMD-GWO-ELM and EEMD-GWO-ELM, the RMSE value of EEMD-GWO-ELM is 0.1917, which is lower than the value of 0.2559 of EMD-GWO-ELM; the R 2 value is higher, but the MAPE value is bigger than the EMD-GWO-ELM.Therefore, the EEMD decomposes the model performances better than the EMD.Above all, there is not enough evidence on which to judge which one is better.Thus, this conclusion will be demonstrated in the following section.

Additional Forecasting Case
In order to further prove the prediction accuracy of the proposed method, the carbon price series of Beijing, Guangdong, and Shanghai are predicted in this paper.Figures 15-17 indicate the forecasting results.Also, the comparisons of the RMSE, MAPE, and R 2 between the models are shown in Table 8.As Table 8 shows, the RMSE and MAPE are the lowest, and the R 2 is the highest in the ESMD-GWO-ELM model between the models of the three markets, which further proved the prediction accuracy of the proposed model.Notice that the prediction accuracies of the EEMD-GWO-ELM model for Beijing and Shanghai are better than EMD-GWO-ELM.However, the forecasting performance of EMD-GWO-ELM is better than for EEMD-GWO-ELM in the Guangdong market.Consequently, it is hard to judge which one is better, for the reason that the results are not exactly matched.Nonetheless, it is evident that the ESMD-GWO-ELM model obtains the best prediction accuracies for the three markets.Both the RMSE and MAPE are the lowest, and the R 2 is the highest, between these models.
Undoubtedly, this study can draw the conclusion that the proposed model, the ESMD-GWO-ELM model, achieves the highest level of accuracy according to the results, which indicate that the proposed model is suitable for effective forecasting of the carbon price, when only considering historical carbon price series.

Additional Forecasting Case
In order to further prove the prediction accuracy of the proposed method, the carbon price series of Beijing, Guangdong, and Shanghai are predicted in this paper.Figures 15-17 indicate the forecasting results.Also, the comparisons of the RMSE, MAPE, and R 2 between the models are shown in Table 8.As Table 8 shows, the RMSE and MAPE are the lowest, and the R 2 is the highest in the ESMD-GWO-ELM model between the models of the three markets, which further proved the prediction accuracy of the proposed model.Notice that the prediction accuracies of the EEMD-GWO-ELM model for Beijing and Shanghai are better than EMD-GWO-ELM.However, the forecasting performance of EMD-GWO-ELM is better than for EEMD-GWO-ELM in the Guangdong market.Consequently, it is hard to judge which one is better, for the reason that the results are not exactly matched.Nonetheless, it is evident that the ESMD-GWO-ELM model obtains the best prediction accuracies for the three markets.Both the RMSE and MAPE are the lowest, and the R 2 is the highest, between these models.
Undoubtedly, this study can draw the conclusion that the proposed model, the ESMD-GWO-ELM model, achieves the highest level of accuracy according to the results, which indicate that the proposed model is suitable for effective forecasting of the carbon price, when only considering historical carbon price series.

Additional Forecasting Case
In order to further prove the prediction accuracy of the proposed method, the carbon price series of Beijing, Guangdong, and Shanghai are predicted in this paper.Figures 15-17 indicate the forecasting results.Also, the comparisons of the RMSE, MAPE, and R 2 between the models are shown in Table 8.As Table 8 shows, the RMSE and MAPE are the lowest, and the R 2 is the highest in the ESMD-GWO-ELM model between the models of the three markets, which further proved the prediction accuracy of the proposed model.Notice that the prediction accuracies of the EEMD-GWO-ELM model for Beijing and Shanghai are better than EMD-GWO-ELM.However, the forecasting performance of EMD-GWO-ELM is better than for EEMD-GWO-ELM in the Guangdong market.Consequently, it is hard to judge which one is better, for the reason that the results are not exactly matched.Nonetheless, it is evident that the ESMD-GWO-ELM model obtains the best prediction accuracies for the three markets.Both the RMSE and MAPE are the lowest, and the R 2 is the highest, between these models.
Undoubtedly, this study can draw the conclusion that the proposed model, the ESMD-GWO-ELM model, achieves the highest level of accuracy according to the results, which indicate that the proposed model is suitable for effective forecasting of the carbon price, when only considering historical carbon price series.

Conclusions
This paper proposes a new novel hybrid model for carbon price prediction, including extreme-point symmetric mode decomposition, extreme learning machine, and the grey wolf optimizer algorithm, which are applied to forecast the four carbon price series of China's carbon trading pilot markets.Accordingly, three main steps are involved: firstly, owing to the nonlinear and non-stationary nature of carbon price, this study carried out decomposition models to decompose the carbon price into several IMFs and one residue.Then, the PACF is utilized to determine the input variables of each subsequence for the resulting forecasting models.Thirdly, this paper cultivates the GWO-ELM model to predict each decomposition component, and to add them.Moreover, the decomposition method and the prediction method constitute the comparative mixed model.The decomposition models include EMD, EEMD, and ESMD.Also, the prediction methods are ELM and GWO-ELM.To verify the practicability and effectiveness further, the carbon price series of Hubei, Beijing, Shanghai, and Guangdong are selected for the predictions.The results illustrate that the proposed model performs significantly better than all of the considered benchmark models.To conclude, the proposed model is suitable for forecasting the non-stationary and nonlinear carbon price.Several conclusions can be obtained from this paper: (a) The decomposed prediction models (EMD, EEMD, and ESMD) are obviously superior to the direct prediction models, based on the forecasting results, which indicates that it is necessary to decompose the non-stationary and nonlinear carbon price.(b) During the processing of carbon price forecasting, the models with ESMD perform better than those with EMD and EEMD.However, there is not enough evidence between the EMD and EEMD models to prove which one is better.(c) The proposed model, ESMD-GWO-ELM, performs better than the other methods for China carbon price prediction, when only the historical carbon price sequences are taken into account.
To conclude, this study provides a new approach for forecasting the carbon prices of China's regional carbon emissions trading market.The conclusions obtained can provide guidance for carbon price predictions.In addition, this paper can obtain future carbon price changes for carbon trading markets, according to the above-described technical route in practical application, which is conductive to policy makers and inventors.Policy makers could adjust the price range of the price and the quantity of the market stability reserve, to guarantee the positive operation of the carbon trading market.Accordingly, the forecasting model could guide investors to obtain a clear understanding of the carbon trading market, and make reasonable investments.

Conclusions
This paper proposes a new novel hybrid model for carbon price prediction, including extreme-point symmetric mode decomposition, extreme learning machine, and the grey wolf optimizer algorithm, which are applied to forecast the four carbon price series of China's carbon trading pilot markets.Accordingly, three main steps are involved: firstly, owing to the nonlinear and non-stationary nature of carbon price, this study carried out decomposition models to decompose the carbon price into several IMFs and one residue.Then, the PACF is utilized to determine the input variables of each subsequence for the resulting forecasting models.Thirdly, this paper cultivates the GWO-ELM model to predict each decomposition component, and to add them.Moreover, the decomposition method and the prediction method constitute the comparative mixed model.The decomposition models include EMD, EEMD, and ESMD.Also, the prediction methods are ELM and GWO-ELM.To verify the practicability and effectiveness further, the carbon price series of Hubei, Beijing, Shanghai, and Guangdong are selected for the predictions.The results illustrate that the proposed model performs significantly better than all of the considered benchmark models.To conclude, the proposed model is suitable for forecasting the non-stationary and nonlinear carbon price.Several conclusions can be obtained from this paper: (a) The decomposed prediction models (EMD, EEMD, and ESMD) are obviously superior to the direct prediction models, based on the forecasting results, which indicates that it is necessary to decompose the non-stationary and nonlinear carbon price.(b) During the processing of carbon price forecasting, the models with ESMD perform better than those with EMD and EEMD.However, there is not enough evidence between the EMD and EEMD models to prove which one is better.(c) The proposed model, ESMD-GWO-ELM, performs better than the other methods for China carbon price prediction, when only the historical carbon price sequences are taken into account.
To conclude, this study provides a new approach for forecasting the carbon prices of China's regional carbon emissions trading market.The conclusions obtained can provide guidance for carbon price predictions.In addition, this paper can obtain future carbon price changes for carbon trading markets, according to the above-described technical route in practical application, which is conductive to policy makers and inventors.Policy makers could adjust the price range of the price and the quantity of the market stability reserve, to guarantee the positive operation of the carbon trading market.Accordingly, the forecasting model could guide investors to obtain a clear understanding of the carbon trading market, and make reasonable investments.

Figure 1 .
Figure 1.The hierarchy of grey wolves.

Figure 1 .
Figure 1.The hierarchy of grey wolves.

→D
refers to the distance between the grey wolf and the prey, and → A and →

X 3
and → X δ are the positions of α, β, and δ, respectively; indicate the directions and distances of ω wolf towards α, β, and δ.The steps of the GWO algorithm are briefly shown in Algorithm 1.

Figure 2 .
Figure 2. The process of the proposed GWO-ELM model.

Figure 2 .
Figure 2. The process of the proposed GWO-ELM model.

Figure 3
Figure3is the forecasting framework for carbon price, with the ESMD-GWO-ELM model.As can be seen, the original carbon price series is decomposed into several IMFs, and one residue by EMD, EEMD and ESMD at first.Then, the partial autocorrelation function (PACF) is utilized to determine the input variables of each subsequence, so that the series are divided into the training set and test set.Thirdly, the GWO-ELM is employed to predict each series.Finally, the prediction results are compared with the real carbon price.

Figure 3 .
Figure 3.The forecasting process for carbon price with ESMD-GWO-ELM model.
. The daily data are collected in China's carbon emissions trading website [62].The Hubei sample covers the period from 2 April 2014 until 14 August 2018, with 1228 total data points, excluding public holidays.The carbon prices in the Shanghai and Guangdong markets, both from 19 December 2013 until 14 August 2018, and the carbon prices of Beijing from 28 November 2013 until 14 August 2018.The price fluctuations of original carbon price series are shown in Figure4.From it, it is evident that carbon price series of the four markets were non-stationary and nonlinear.

Figure 3 .
Figure 3.The forecasting process for carbon price with ESMD-GWO-ELM model.
. The daily data are collected in China's carbon emissions trading website [62].The Hubei sample covers the period from 2 April 2014 until 14 August 2018, with 1228 total data points, excluding public holidays.The carbon prices in the Shanghai and Guangdong markets, both from 19 December 2013 until 14 August 2018, and the carbon prices of Beijing from 28 November 2013 until 14 August 2018.The price fluctuations of original carbon price series are shown in Figure 4. From it, it is evident that carbon price series of the four markets were non-stationary and nonlinear.The data series were divided into two parts: the training sample and the test sample.There are several methods of segmenting the data to train and test the whole data.Generally, most studies classify data for training and assessment based on convenience.In this paper, data is split using a ratio equal to 70:30.Consequently, taking Hubei as an example, my 860 data points are selected as training sets, and the remaining 368 data points are applied, to evaluate the established models.The samples of carbon price are shown in

Figure 5 .
Figure 5.The results of the Hubei carbon price decomposed by ESMD.

Figure 6 .
Figure 6.The results of the Hubei carbon price decomposed by EMD.

Figure 7 .
Figure 7.The results of the Hubei carbon price, decomposed by EEMD.

Figure 5 .
Figure 5.The results of the Hubei carbon price decomposed by ESMD.

Figure 5 .
Figure 5.The results of the Hubei carbon price decomposed by ESMD.

Figure 6 .
Figure 6.The results of the Hubei carbon price decomposed by EMD.

Figure 7 .
Figure 7.The results of the Hubei carbon price, decomposed by EEMD.

Figure 6 . 22 Figure 5 .
Figure 6.The results of the Hubei carbon price decomposed by EMD.

Figure 6 .
Figure 6.The results of the Hubei carbon price decomposed by EMD.

Figure 7 .
Figure 7.The results of the Hubei carbon price, decomposed by EEMD.Figure 7. The results of the Hubei carbon price, decomposed by EEMD.

Figure 7 .
Figure 7.The results of the Hubei carbon price, decomposed by EEMD.Figure 7. The results of the Hubei carbon price, decomposed by EEMD.

Figure 8 .
Figure 8.The PCAF results of the Hubei carbon price after ESMD.

Figure 9 .
Figure 9.The PCAF results of the Hubei carbon price after EMD.

Figure 8 .
Figure 8.The PCAF results of the Hubei carbon price after ESMD.

Figure 8 .
Figure 8.The PCAF results of the Hubei carbon price after ESMD.

Figure 9 .
Figure 9.The PCAF results of the Hubei carbon price after EMD.Figure 9.The PCAF results of the Hubei carbon price after EMD.

Figure 9 .
Figure 9.The PCAF results of the Hubei carbon price after EMD.Figure 9.The PCAF results of the Hubei carbon price after EMD.

Figure 10 .
Figure 10.The PCAF results of the Hubei carbon price after EEMD.

Figure 10 .
Figure 10.The PCAF results of the Hubei carbon price after EEMD.

Figure 11 .
Figure 11.The results of Hubei carbon price forecasting.

Figure 12 .
Figure 12.The values of the RMSE-parallel models.

Figure 11 .
Figure 11.The results of Hubei carbon price forecasting.

Figure 11 .
Figure 11.The results of Hubei carbon price forecasting.

Figure 12 .
Figure 12.The values of the RMSE-parallel models.Figure 12.The values of the RMSE-parallel models.

Figure 12 .
Figure 12.The values of the RMSE-parallel models.Figure 12.The values of the RMSE-parallel models.

Figure 13 .
Figure 13.The values of the MAPE parallel models.

Figure 14 .
Figure 14.The values of the R 2 parallel models.

Figure 13 .
Figure 13.The values of the MAPE parallel models.

Figure 13 .
Figure 13.The values of the MAPE parallel models.

Figure 14 .
Figure 14.The values of the R 2 parallel models.

Figure 14 .
Figure 14.The values of the R 2 parallel models.

Figure 15 .
Figure 15.The results of Beijing carbon price forecasting.

Figure 16 .
Figure 16.The results of Guangdong carbon price forecasting.

Figure 15 .
Figure 15.The results of Beijing carbon price forecasting.

Figure 15 .
Figure 15.The results of Beijing carbon price forecasting.

Figure 16 .
Figure 16.The results of Guangdong carbon price forecasting.Figure 16.The results of Guangdong carbon price forecasting.

Figure 16 .
Figure 16.The results of Guangdong carbon price forecasting.Figure 16.The results of Guangdong carbon price forecasting.

Figure 17 .
Figure 17.The results of Shanghai carbon price forecasting.

Figure 17 .
Figure 17.The results of Shanghai carbon price forecasting.

Table 1 .
Samples of carbon price.

Table 1 .
Samples of carbon price.

Table 2 .
Statistical descriptions of the carbon price.
Note:The Stat means statistics.The Prob. indicates that the probability.× indicates that the ADF test of the carbon market price is non-stationary at the 10% significance level.

Table 4 .
BDS test results.Note: The Stat means statistics.The Prob. indicates that the probability.× shows that the carbon price is nonlinear at a significance level of 1%.

Table 5
presents the PACF results of the Hubei carbon price after ESMD, EMD, and EEMD.

Table 5 .
The PACF results of the Hubei carbon price.

Table 6 .
The calculation formula.

Table 5 .
The PACF results of the Hubei carbon price.

Table 6 .
The calculation formula.

Table 7 .
Comparison of the performance of Hubei carbon price forecasting.

Table 7 .
Comparison of the performance of Hubei carbon price forecasting.

Table 7 .
Comparison of the performance of Hubei carbon price forecasting.

Table 8 .
Analysis of the Beijing, Guangdong, and Shanghai forecasting results.

Table 8 .
Analysis of the Beijing, Guangdong, and Shanghai forecasting results.