A Novel Extended Higher-Order Moment Multi-Factor Framework for Forecasting the Carbon Price: Testing on the Multilayer Long Short-Term Memory Network

: Predicting the carbon price accurately can not only promote the sustainability of the carbon market and the price driving mechanism of carbon emissions, but can also help investors avoid market risks and increase returns. However, previous research has only focused on the low-order moment perspective of the returns for predicting the carbon price, while ignoring the shock of extreme events and market asymmetry originating from its pricing factor markets. In this paper, a novel extended higher-order moment multi-factor framework (EHM-APT) was formed to improve the prediction and to capture the driving mechanism of the carbon price. Furthermore, a multi-layer and multi-variable Long Short-Term Memory Network (Multi-LSTM) model was constructed so that the parameters and structure could be determined experimentally for testing the performance of the proposed framework. The results show that the pricing framework considers the shock of extreme events and market asymmetry and can improve the prediction compared with a framework that does not consider the shock of higher-order moment terms. Additionally, the Multi-LSTM model is more competitive for prediction than other benchmark models. This conclusion proves the rationality and accuracy of the proposed framework. The application of the pricing framework encourages investors and ﬁnancial institutions to pay more attention to the pricing factor of extreme events and market asymmetry for accurate price prediction and investment analysis.


Introduction
Upon signing the Kyoto protocol, the carbon market was formally established in 2005 in an attempt to reduce global greenhouse gas. The carbon market defines virtual carbon emission rights as scarce valuable assets, gives them commodity attributes, and realizes the target of resource allocation and emission reduction through market transaction among reduction entities. The signing of the Paris Agreement in December 2015 further highlights the carbon market's capital allocation for achieving emission reduction on a global scale [1]. As a core issue of the carbon market mechanism, accurate forecasting of the carbon price can develop an efficient carbon pricing mechanism, and also help investors to avoid market risks and to increase returns. As an emerging policy-based artificial market, the carbon market is characterized by strong sensitivity to policy shocks, especially carbon dioxide (CO2) reduction policies and carbon quota policies [2]. Therefore, as for the certain pricing framework, the prediction of the carbon price should not only follow the basic pricing method of general financial assets, but also reflect the special driving mechanism of the carbon price.
Further evidence has verified that the shock of policy turbulence (like the implementation of carbon tax) and uncertainty in the energy and financial markets can easily affect the carbon price through financial channels [3]. In other words, the shock from the energy and financial markets will also have an impact on the carbon price forecast and price fluctuation. Consequently, the carbon price is vulnerable to national strategies, government regulations, international agreements, and other policy factors, as well as asymmetric influences from market participants' behaviors that originate from its influential markets. Thus, it is worth exploring the carbon price driving mechanism, which reflects the impact of external shock and market asymmetry.
However, the common foundation of existing research only focuses on the low-order moment perspective of the returns (market return and variance), while ignoring the shock of extreme events and market asymmetry originating from its influential markets from the framework of higher-order moment (market skewness and kurtosis). That is, these ignored factors are not included in the analysis framework of the pricing mechanism. Actually, market skewness and kurtosis have proven to be valuable indicators that characterize market asymmetry information and market extreme factors for explaining portfolio returns [4]. This kind of defect makes the accuracy of the existing carbon price prediction questionable.
To overcome this weakness, the multivariate skew-GARCH (Generalized Autoregressive Conditional Heteroskedasticity) model, which incorporates the third-order moment of financial assets, was introduced to explain finance return and the asymmetric relation among some European financial markets [5]. However, the multivariate skew-GARCH model has more parameters to be estimated, and needs certain assumptions on the tail distribution of the residual. Furthermore, Fry et al. [6] developed a binary higher-order moment Capital Asset Pricing Model (CAPM) model, which considers the shock of co-skewness and co-kurtosis to represent the risk of market asymmetry and extreme event on the return of the portfolio. As a result, the higher-order moment CAPM model provides a reasonable framework to study the pricing of carbon assets under the impact of external events and information asymmetry. As we know, policy factors and information asymmetry are important external factors affecting the carbon price. While many scholars use Extreme Value Theory (EVT) to examine extreme risk shocks in financial markets [7], the EVT model ignores the extreme shock from other relevant markets and only considers the extreme impact generated by the financial market itself. Thus, the higher-order moment CAPM model is superior to EVT-based models in describing the asset prices affected by the extreme event shocks.
However, the higher-order moment CAPM model mentioned above is limited in its ability to describe the relationship between two assets, and requires that the asset returns obey the binary standard normal distribution, which makes it difficult to capture the non-linear mapping relationship between the carbon price and the multiple pricing factors.
The aim of this article is to construct a novel extended higher-order moment multi-factor framework for forecasting the carbon price, in which the framework is in line with the non-linear and peak thick tail characteristics of the carbon price. Furthermore, this paper relaxes the framework of higher-order moment CAPM mentioned above to multivariate factors, and brings the higher-order moment terms that indicate the shock of market asymmetry and extreme events originating from its influential markets into the carbon pricing framework. Thus, based on the arbitrage pricing theory (APT), an extended higher-order moment multi-factor framework (EHM-APT) that conforms to the special driving mechanism of the carbon price is formed. The above research design is to compensate for the inaccurate pricing caused by neglecting the pricing factors of extreme events and market asymmetry information. The proposed framework is convinced that the carbon price can be theoretically explained by the higher-order moment terms of its influential markets. Further, this article uses a machine learning method to empirically evaluate the accuracy and performance of the framework. Specifically, this paper utilizes the Long Short-Term Memory network with the designed multi-layer and multi-variable structure (Multi-LSTM) to predict the carbon price and capture the non-linear mapping relationship between the carbon price and its pricing factors.
The conclusion of this article summarizes that the proposed EHM-APT framework can significantly improve the prediction of the carbon price. That is to say, the co-skewness and co-kurtosis, which represent the market asymmetry and extreme events, which stem from its pricing factor markets, are indispensable factors for predicting and fitting the carbon price. Therefore, it is valuable to allocate and arrange funds for carbon investors by taking market extreme factors and market asymmetric information as indispensable pricing factors. The conclusion of this article and the proposed pricing framework will help investors to predict and analyze the carbon price both effectively and accurately.
The remaining structure of this paper is as follows: Section 2 reviews the related literature. Section 3 introduces the methodology of this paper. Section 4 presents the experiment analysis, contains the descriptive statistics and the design of the Multi-LSTM model, and offers the results and the discussion. Section 5 concludes this paper and puts forward ideas for further research.

Literature Review
The growing academic literature on carbon price forecasting has focused on two major perspectives: One is the traditional econometric models, the other is the artificial intelligence models.
For traditional econometric models, the GARCH model has potential advantages in capturing the non-linear characteristics of the carbon price. Chevallier [8] proposed that the asymmetric threshold GARCH model can effectively capture the characteristics of the carbon future price and improve prediction accuracy. Byun et al. [9] put forward that GARCH-type models are superior to the k-nearest neighbor model for predicting the carbon price. Koop and Tole [10] created the dynamic model averaging (DMA) method to forecast the carbon price, and the results showed a high prediction accuracy compared with the Bayesian model and the time-varying parameter regression model. Sanin et al. [11] demonstrated that the integration of Autoregressive Moving Average model (ARMAX) and the GARCH model, when using a time-varying jump process, can accurately predict the carbon price compared with a standard ARMAX-GARCH framework. Benz et al. [12] concluded that the Markov regime-switching GARCH model outperforms other Markov regime-switching or simple GARCH models in forecasting the carbon price. In addition, Chevallier [13] maintained that a nonparametric method of predicting the carbon price can reduce the prediction error by almost 15% compared with linear autoregression models. Based on the integration of the threshold dynamic conditional correlation (DCC) GARCH model and the full Baba, Engle, Kraft, and Kroner (BEKK) GARCH model, the volatility spillover from the price decrease of the energy market may have a stronger shock on carbon price, and the GARCH-based model can better reveal the price volatility mechanism of carbon assets based on the analysis of volatility spillover [14,15].
However, the applicability of traditional econometric models usually requires that the returns follow a strict hypothesis, such as normal distribution at the tail of carbon returns [16]. As a matter of fact, the carbon return has significant non-normal and non-linear characteristics, and asset returns exhibit a peak and thick tail feature [8], which results in traditional pricing methods failing to predict the carbon price accurately.
Being capable of mapping non-linear functions and not needing to consider the tail distribution of return series, artificial intelligence methods include artificial neural network (ANN), support vector machine (SVM), least squares support vector machine (LSSVM), multilayer perceptron (MLP), and Hybrid Fuzzy Neural Network (HFNN), which are superior to traditional econometric models in solving forecasting issues [17,18]. To determine the parameters of the LSSVM model, Zhu et al. [19] proposed an integration approach of a group method of data handling (GMDH), particle swarm optimization (PSO), and the LSSVM model for carbon price prediction, in which the prediction accuracy is higher than that of ANN and the autoregressive integrated moving average (ARIMA) model.
Fan et al. [20] established an MLP-ANN prediction model, and the results showed that the proposed model has good performance compared with the models of ARIMA, ANN, and LSSVM. Recently, research has found that an integrated model that combines the EMD (Empirical model decomposition) method with ANN and LSSVM can achieve better performance for forecasting the carbon price than that of the EMD method alone [21]. Additionally, Atsalakis [22] proposed a computational intelligence-based model with a novel hybrid neuro-fuzzy controller for forecasting the carbon price, which obtained a higher accuracy. Zhu et al. [23] combined variational mode decomposition (VMD) and spiking neural networks (SNNs) to improve forecasting accuracy and reliability.
There are two defects in the above research. Firstly, the foundation of these studies may ignore the shock of the higher-order moment attribute (skewness-kurtosis) of the pricing factors on the carbon price. The common theoretical basis of the above models is exploring the linear or non-linear regression relationship between the carbon price and its pricing factors under the multi-factor framework, which is a manner of the low-order moment of return. Correspondingly, the previous studies imply the hypothesis that the shock of extreme events or information asymmetry that stem from its pricing factor markets will not affect carbon price. Secondly, the forecasting models mentioned in previous research are flawed in their ability to predict carbon price accurately. Because the traditional statistical and econometric models require that the return follows a strict hypothesis, artificial intelligence methods are easily able to fall into local minima for solving forecasting issues and have difficulty in achieving a globally optimal situation. Therefore, to remedy the shortcomings of the existing research, the contributions of this paper are as follows: firstly, this article relaxes the framework of binary higher-order moment CAPM to multivariate factors. That is, to bring the higher-order moment terms that indicate the shock of market asymmetry and extreme events into the carbon pricing framework. As for the framework, a complex non-linear and non-structural relationship between the carbon price and its pricing factors cannot be ignored. Therefore, the second innovation is the creation of a model of Multi-LSTM with the designed multi-layer and multi-variable structure to capture the non-linear mapping relationship between the carbon price and its pricing factors. The reasons for selecting the Multi-LSTM network are as follows: on the one hand, the Multi-LSTM has an advantage in handling time series data with non-linear and complex relationships, which is consistent with the time lag characteristic of the carbon price sequence. On the other hand, a remarkable feature of Multi-LSTM in the training process is that the model adjusts adaptively for obtaining optimal parameters and structure, and can also avoid falling into the local optimal solution [24], rather than the other optimization models such as the genetic algorithm and the backpropagation algorithm used in the process of neural network training [25,26].

Methodology
In this section, we theoretically explain the construction process of the novel extended higher-order moment multi-factor pricing framework. Based on this, the Multi-LSTM network for fitting the pricing framework is introduced.

A Novel Extended Higher-Order Moment Multi-Factor Framework (EHM-APT)
According to the higher-order moment CAPM model proposed by Hwang et al. [27], the return of financial assets is affected not only by systemic risk, but also by irrational behavior and extreme external shock represented by co-skewness and co-kurtosis. Therefore, the higher-order moment CAPM model can reveal more price characteristics than the traditional CAPM model.
The higher-order moment CAPM model, according to Hwang et al. [27], is as follows: and where E(R i ) − R f is the excess return of the asset portfolio, a 1 , a 2 , a 3 represents the risk premium coefficient of β im , γ im , δ im respectively. β im is the coefficient of co-variance and represents the shock of the first-order centered moment (return) of asset portfolio m on the first-order centered moment (return) of asset i, γ im is the coefficient of co-skewness and represents the shock of the second-order centered moment (variance) of asset portfolio m on the first-order centered moment (return) of asset i, and δ im is the coefficient of co-kurtosis and represents the shock of the third-order centered moment (skewness) of asset portfolio m on the first-order centered moment (return) of asset i. The co-skewness reflects the information asymmetry of market investment, and the negative skewness results in the probability of return decline exceeding the probability of rise, thus increasing the losses for investors [6]. Correspondingly, higher co-kurtosis increases the probability of extreme events, which reflects the impact of external events on market investment. For examining the shock of higher-order moment terms on the portfolio return, Fry et al. [6] developed an extended higher-order moment CAPM model by introducing co-skewness and co-kurtosis into the traditional CAPM framework. The suggested model, which incorporates bivariate normal distribution under the shock of co-skewness and co-kurtosis, is shown as where f cos kness (r 1,t , r 2,t ) and f cokurtosis (r 1,t , r 2,t ) represents the portfolio return considering the impact of co-skewness and co-kurtosis, respectively, θ 12 represents the impact coefficient of co-skewness, that is, the shock of the second-order centered moment (variance) of asset 2 on the first-order centered moment (return) of asset 1, θ 13 denotes the impact coefficient of co-kurtosis, that is, the shock of the third-order centered moment (skewness) of asset 2 on the first-order centered moment (return) of asset 1, r 1,t , µ 1 , σ 1 and r 2,t , µ 2 , σ 2 represent the return, mean, and variance of assets 1 and 2, respectively, ρ refers to the correlation coefficient between assets 1 and 2, and η indicates the residual. However, the above extended higher-order moment CAPM model is only applicable to the pricing of binary assets, and the assumption of the binary standard normal distribution is required. To overcome this limitation, this paper proposes an extended higher-order moment multi-factor framework by introducing the co-skewness and co-kurtosis of multivariate pricing factors into the framework. The novel pricing foundation, which considers the shock of market asymmetry and extreme events, is shown as where where f (r 0,t ) represents the carbon return, α 1 , α 2 , . . . α n represents the impact coefficient of the pricing factors, r 0,t , µ 0 , σ 0 and r i,t , µ i , σ i represent the return, mean, and variance of the carbon market and its pricing factors, respectively, θ i denotes the impact coefficient of the second-order centered moment (variance) of carbon pricing factors on the first-order centered moment (return) of carbon return, that is, the shock of market asymmetry, and δ i denotes the impact coefficient of the third-order centered moment (skewness) of carbon pricing factors on the first-order centered moment (return) of carbon return, that is, the shock of extreme events.

A Multi-Layer and Multi-Variable LSTM (Multi-LSTM) Model for Predicting the Carbon Price
This paper constructs a multi-layer and multi-variable LSTM model based on the research of Hochreiter et al. [24] for predicting the carbon price and investigating the performance of the proposed EHM-APT framework.
The LSTM network consists of an input layer, an output layer, and a hidden layer. Remarkably, the hidden layer of the LSTM is no longer a common neuron compared with the traditional Recurrent Neural Network (RNN), but rather an LSTM cell with special memory function. As the core of LSTM, the cell can eliminate or enhance the transmission of input data to the cell by controlling the structure of the designed gate. That is, the forget gate, the input gate, and the output gate. Furthermore, the Multi-LSTM model contains more cells and hidden layers based on single layers. The Multi-LSTM with a multi-layer chain structure and recursive network features can extract and mine the feature of input data, thus significantly improving the ability of learning and training.
During the training of LSTM, the three gates of LSTM receive two kinds of external information at time t, that is, X t represents the input of the current state and H t-1 denotes the output of the hidden layer at the previous state. In addition, each gate receives internal information, that is, the memory unit of C t-1 at the previous state. Among the three gates, the forget gate determines the discarded information in the cell, which is equivalent to decay of information in each dimension [24]. The input gate determines the type of updated and input information in the cell. In this process, a sigmoid function is responsible for updating the information and the tanh function creates a new candidate vector of C t at the current cell state. The output gate determines the output of the current cell through the sigmoid function, while the tanh function is used to convert the current output into a value between -1 and 1, and then the final output of LSTM is obtained after multiplying the output of the above two functions. The training structure of the Long Short-Term Memory network is shown in Figure 1.

258
The information filtering of the forget gate is shown as The information updating of the input gate is shown as The information filtering of the forget gate is shown as The information updating of the input gate is shown as The information screening of the output gate is shown as where ⊗ means the Kronecker product and ⊕ means the XOR logic operation. The information updating vector, the candidate vector, and the update vector of the input gate at the current cell are represented by i t , C t , and C t . h t refers to the final output of the LSTM network. W f , W i , W C , and W o represent the weight vectors. b f , b i , b C , and b o show the bias of the training process and σ is the sigmoid function.

Evaluation Criteria of Multi-LSTM for Determining the Performance of the Proposed Framework
This paper conducts the following criteria to evaluate the parameters and performance of the Multi-LSTM model. The five criteria are where Y = y 1 , y 2 , · · · , y T represents the time series of actual carbon return,Ŷ = ŷ 1 ,ŷ 2 , · · · ,ŷ T means the series of predicted carbon return, and T denotes the observations. Furthermore, we chose the criteria of Root-Mean-Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE) because they are utilized in model performance judgment and are often used in the existing literature. These indicators are all real numbers and are between 0 and 1. If their values are smaller, then the closer the true value to the predicted value, and thus the better the prediction performance of the model. As for the investors, the criterion of direction accuracy (DA) means the probability that the market trend is consistent with the correctly predicted direction. Investors generally pay more attention to the probability of the market's predicted value deviating from the true value, as the right investment direction can help investors make more valuable decisions. If the value of DA is larger, it means the predicted carbon return is closer to the psychological expectation of investors. CORR represents the Pearson correlation coefficient, which is used to measure the correlation between the true carbon return and the predicted return. The value of CORR is between 0 and 1. The closer the CORR value is to 1, the stronger the correlation, and vice versa.

The Data and Basic Statistical Analysis
The pricing factors of the carbon market include the prices of carbon homogeneous products, capital market products, and energy market products [13]. Therefore, this paper selects the EUAF (European Union Allowance Future) as the representative product of the carbon market for its large volume in EU ETS (European Union Emissions Trading System). Additionally, we chose the EUAS (European Union Allowance Spot) as the variable of carbon homogeneous products, took the DJIA (Dow Jones Industrial Average) and EURUSD as the typical products of capital market, and selected coal, crude oil, and natural gas as the representations of the energy market for their larger consumption. The data of EUAF and EUAS were sourced from the database of Intercontinental Exchange (ICE), and the other data were sourced from the Wind database. The research samples ranged from the period of 2 June 2009 to 8 May 2019, and there was a total of 2521 observations by eliminating the samples with missing information or time inconsistency. The return is expressed as: r t , and r t = 100 × (lnP t − lnP t−1 ), where P t denotes the product's settlement price.
We can draw the following conclusions from the descriptive statistics, as shown in Table 1: Firstly, all of the return series have good stationarity and are significant at the 1% level. Secondly, the mean difference of each return is small, but the standard deviation is big. As an example, the products of EUAF, EUAS, and gas show larger variance statistics, while that of EURUSD has a smaller variance. Thirdly, all of the return series show negative skewness, while the kurtosis values of DJIA and gas are the largest. This result indicates that the probability of return decline is greater than that of rise, and these markets are more vulnerable to the shock of extreme events. Furthermore, this paper also discloses the co-skewness and co-kurtosis statistics of each sample market. Negative co-skewness indicates information asymmetry from the pricing factor markets, which can easily increase the probability of a decline in carbon return. Positive co-kurtosis, especially the larger co-kurtosis presented by EUAS and DJIA, indicates that the external events suffered by these two markets will spread to the carbon market in a greater probability. Note: *** denotes the statistical significance at the 1% level. Coskewness12 and Cokurtosis13 mean the co-skewness and co-kurtosis coefficients between EUAF and the other market products, and the value is calculated by E(r x,t − E(r x,t ))(r y,t − E(r x,t )) 2 /σ x σ 2 y and E(r x,t − E(r x,t ))(r y,t − E(r x,t )) 3 /σ x σ 3 y , respectively, where r x,t and r y,t denote the return of EUAF and the other market products.
Furthermore, in the joint distribution of EUAF and DJIA, EURUSD, coal has a large kurtosis, as shown in Figure 2. This indicates that the portfolios are more vulnerable to the shock of extreme events. This result provides further strong evidence for studying the shock of pricing factor markets on the carbon price from the perspective of higher-order moment.

335
However, there is no mature method to determine the number of hidden layers for a deep network, 336 although extensive research by Le and Bengio [28] suggests that several hidden layers are better than 337 one. Correspondingly, this paper applied an experimental method to determine the number of 338 hidden layers.

339
In this experiment, six levels of the hidden layer number ranging from 1 to 6 and hidden nodes 340 4, 8, 16, 32, 64, and 128 were used according to the research experience of Shen et al. [29]. In carbon 341 price forecasting, the number of input nodes corresponds to the number of past observations, that is, 342 the number of carbon pricing factors return series and their co-skewness and co-kurtosis terms.

343
Meanwhile, the number of output nodes is set to 1, because the carbon return series is the prediction 344 target. The sample data were divided into two parts in our experiment, that is, 80% of samples were 345 used to train model parameters, and the other 20% were used to test the performance of the model in 346 predicting carbon returns. The empirical results, as shown in

349
The number of hidden layers in the Multi-LSTM network reflects the non-linear processing 350 ability to fit the input data. By using the LSTM network with a few hidden layers, it may difficult to 351 fit the input data. Meanwhile, it may also be unreasonable to design too many hidden layers, as this

Designing the Structure of Multi-LSTM
The number of hidden layers is a critical parameter for designing the Multi-LSTM network. However, there is no mature method to determine the number of hidden layers for a deep network, although extensive research by Le and Bengio [28] suggests that several hidden layers are better than one. Correspondingly, this paper applied an experimental method to determine the number of hidden layers.
In this experiment, six levels of the hidden layer number ranging from 1 to 6 and hidden nodes 4, 8, 16, 32, 64, and 128 were used according to the research experience of Shen et al. [29]. In carbon price forecasting, the number of input nodes corresponds to the number of past observations, that is, the number of carbon pricing factors return series and their co-skewness and co-kurtosis terms. Meanwhile, the number of output nodes is set to 1, because the carbon return series is the prediction target. The sample data were divided into two parts in our experiment, that is, 80% of samples were used to train model parameters, and the other 20% were used to test the performance of the model in predicting carbon returns. The empirical results, as shown in Table 2, suggest that the Multi-LSTM model with two hidden layers has the lowest RMSE and MAE values, which can effectively reflect the carbon pricing model under the shock of higher-order moment. Note: Bold numbers are the minimum root-mean-square error (RMSE) and mean absolute percentage error (MAPE), respectively. MAE, mean absolute error.
The number of hidden layers in the Multi-LSTM network reflects the non-linear processing ability to fit the input data. By using the LSTM network with a few hidden layers, it may difficult to fit the input data. Meanwhile, it may also be unreasonable to design too many hidden layers, as this may waste a lot of training time and may easily lead to the model being overfitted. The results shown in Table 2 report that the best performance regarding RMSE and MAPE occurs with two hidden layers, because the values of RMSE and MAE are 0.13435 and 1.91017, respectively, which is the lowest value of the entire alternate hidden layer. Further analysis shows that there exists the smallest RMSE and MAE when the number of hidden layer nodes is 64, with the values of 0.1075 and 1.55172, respectively. Therefore, the optimal number of hidden nodes in the Multi-LSTM under the shock of higher-order moment is 64, and the appropriate architecture of Multi-LSTM is 18-64-64-1, where 18 is the number of input nodes and 1 is the output node.
For comparison, we also trained the network structure of the pricing model without considering the shock of higher-order moment, and the optimal architecture is 18-128-128-1, which is different from the structure of Multi-LSTM under the shock of higher-order moment.

Discussing the Performance of the Proposed Pricing Model
For evaluating the performance of Multi-LSTM in forecasting the carbon price under the shock of higher-order moment of pricing factors, in this section, we take the following classifier as the benchmark model for comparison.
The first is the BP (Back Propagation) model, which is a kind of neural network structure with error backpropagation algorithm and has the ability of non-linear mapping. We took it as a baseline to fit the non-linear relationship of the carbon price and its pricing factors. The second is the GARCH model, which has an advantage in simulating the volatility of the carbon price series. This model has been proven to perform well in previous research. The third one is the Multilayer Perceptron model (MLP), which is a deep neural network structure with many hidden layers, and it can improve the non-linear mapping ability and prediction accuracy in predicting the carbon price. The fourth one is the recurrent neural network (RNN), which a kind of neural network with special memory ability, and has a remarkable advantage in processing time series data. The fifth is the gated recurrent unit (GRU), which belongs to another improved structure of the RNN model compared with the Multi-LSTM. The GRU model combines the forget gate and the input gate into a single update gate compared to the LSTM structure.
During the experiment, we tested the benchmark models of BP, GARCH, MLP, RNN, and Multi-GRU in forecasting the carbon price with the same series data. That is, we studied the out-of-sample performance of the proposed model and the other comparative models in carbon price prediction. During the experiment, the comparison models maintained the same number of input nodes, hidden layers, hidden nodes, and output nodes as the Multi-LSTM model. The actual return and forecasting return of the different models are shown, as in Figures 3 and 4.
The results presented in Table 3 show that the Multi-LSTM model exhibits better accuracy than the other benchmark models in the five criteria for predicting carbon price when considering the shock of higher-order moment. Specifically, RMSE, MAE, and MAPE represent the stability of the model. The values of the three variables (that is, 1.7347, 0.9884, and 0.0575, respectively) of the Multi-LSTM model are significantly lower than those of the other benchmark models. Therefore, we can conclude that the Multi-LSTM model is more superior in stability and reliability for forecasting the carbon price. As for DA, the direction accuracy of the Multi-LSTM model is 0.9189, that is, significantly higher than other models, and the results prove that the Multi-LSTM model presents more accurate prediction for supporting investor decisions. The correlation coefficient for the Multi-LSTM model is 0.954, which is better than that of Multi-GRU (0.9083), RNN (-0.068), MLP (0.0357), GARCH (0.034), and BP (-0.0332). 12 During the experiment, we tested the benchmark models of BP, GARCH, MLP, RNN, and Multi-

397
The results presented in Table 3 show that the Multi-LSTM model exhibits better accuracy than 398 the other benchmark models in the five criteria for predicting carbon price when considering the  Note: DA (direction accuracy) means the probability that the market trend is consistent with the correctly predicted direction. CORR represents the Pearson correlation coefficient between the true carbon return and the predicted return.
The performance of the carbon pricing model without considering the shock of higher-order moment is also listed for comparison. As shown in Table 3, the superiority of the Multi-LSTM model is further proven in all of the five criteria. Figure 5 reports the prediction error curve of the proposed model and the benchmark models when both considering and not considering the shock of higher-order moment, respectively. It is noteworthy that the values of RMSE, MAE, and MAPE for the Multi-LSTM model when considering the shock of higher-order moment are significantly lower than those when not considering the shock of higher-order moment. Meanwhile, the values of DA and CORR for the Multi-LSTM model under the shock of higher-order moment are significantly larger than those when not considering the shock of higher-order moment. Therefore, we can draw a unified conclusion that the Multi-LSTM model based on the proposed EHM-APT framework can significantly improve the prediction accuracy when considering the shock of higher-order moment compared with the pricing models that do not consider the shock of higher-order moment.

446
In this paper, we developed an extended higher-order moment Multi-factor framework (EHMin this paper not only helps to promote the construction of an effective carbon pricing mechanism, 450 but also provides a more effective market-oriented means for carbon emission reduction and the  Figure 5 shows the performance of the different models visually, and highlights the superiority of multi-LSTM in fitting the proposed EHM-APT framework.

Conclusions and Prospects
In this paper, we developed an extended higher-order moment Multi-factor framework (EHM-APT) for predicting the carbon price by extending the theory of the binary higher-order moment CAPM model developed by Fry et al. [6] to multivariate factors. The proposed EHM-APT framework in this paper not only helps to promote the construction of an effective carbon pricing mechanism, but also provides a more effective market-oriented means for carbon emission reduction and the sustainability of low-carbon economy. As for the EHM-APT framework, we considered the impact of market asymmetry and extreme events that stem from its pricing factor markets on the carbon price compared with the traditional APT model. Furthermore, a multi-layer and multi-variable LSTM (Multi-LSTM) model was constructed so that the parameters and structure can be determined experimentally for testing the performance of the proposed EHM-APT framework in predicting the carbon price. Some main conclusions are summarized as follows.
Firstly, the proposed EHM-APT framework, when considering the shock of higher-order moment, can significantly improve the prediction accuracy of the carbon price compared with the framework when not considering the shock higher-order moment. This conclusion further proves that the shock of market asymmetry and extreme events that stem from its pricing factors is an indispensable factor for predicting and fitting the carbon price, which is ignored by previous research. In fact, it has been proven in many studies that co-skewness and co-kurtosis, which represent the market asymmetry and extreme events, are used as the pricing factors of financial assets [4]. Based on this idea, this paper further proves that co-skewness and co-kurtosis are also important and indispensable factors for explaining the pricing mechanism of carbon assets.
Secondly, the Multi-LSTM model for which the parameters and structure were determined experimentally (the structure of 18-64-64-1) in this article, is a suitable network for predicting the carbon price, and its performance is superior to the other benchmark models (that is, the Multi-GRU, RNN, MLP, GARCH, and BP models) in all of the evaluation criteria. This result suggests that the Multi-LSTM model is competitive for predicting the carbon price under the proposed EHM-APT framework. Therefore, we conclude that the model has advantages in effectively capturing and mapping the complex non-linear network relationship between the carbon price and its pricing factors.
This study predicts the carbon price based on the structured pricing factors that the data sourced from certain financial markets. As evidenced in this article, the proposed EHM-APT framework has good applicability, and can support valuable references for solving pricing problems in other financial markets as well. In fact, in addition to the higher-order moment terms, we can also explore pricing factors that can measure market asymmetry information and external shock by means of big data text mining. Future research can explore the non-structured pricing factors obtained by text information technology by applying the proposed multi-factor pricing framework, such as the factor of investor sentiment, policy events, and others. Based on this, another valuable exploration prospect is derived, that is, as the increase of data information by means of text mining technology or other machine learning technologies, the way to achieve information fusion and sharing of data features between the structured and unstructured pricing factors within the established pricing framework is a key issue that requires an outcome.
Author Contributions: Conceptualization, P.Y. and C.Z.; Methodology, P.Y. and Y.W.; Model code analysis, P.Y. and X.Y.; Formal Analysis, Y.W. and X.Y.; Writing-Original Draft Preparation, P.Y. and Z.A.W.; Writing-Review and Editing, P.Y. and C.Z.; Supervision, C.Z. All authors have read and agreed to the published version of the manuscript.
Funding: This paper is supported by the following fund projects: the National Natural Science Foundation of China, No. 71971071 and 71373065.