A Hybrid Model for Carbon Price Forecasting Based on Improved Feature Extraction and Non-Linear Integration

: Accurately predicting the price of carbon is an effective way of ensuring the stability of the carbon trading market and reducing carbon emissions. Aiming at the non-smooth and non-linear characteristics of carbon price, this paper proposes a novel hybrid prediction model based on improved feature extraction and non-linear integration, which is built on complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), fuzzy entropy (FuzzyEn), improved random forest using particle swarm optimisation (PSORF), extreme learning machine (ELM), long short-term memory (LSTM), non-linear integration based on multiple linear regression (MLR) and random forest (MLRRF), and error correction with the autoregressive integrated moving average model (ARIMA), named CEEMDAN-FuzzyEn-PSORF-ELM-LSTM-MLRRF-ARIMA. Firstly, CEEMDAN is combined with FuzzyEn in the feature selection process to improve extraction efficiency and reliability. Secondly, at the critical prediction stage, PSORF, ELM, and LSTM are selected to predict high, medium, and low complexity sequences, respectively. Thirdly, the reconstructed sequences are assembled by applying MLRRF, which can effectively improve the prediction accuracy and generalisation ability. Finally, error correction is conducted using ARIMA to obtain the final forecasting results, and the Diebold–Mariano test (DM test) is introduced for a comprehensive evaluation of the models. With respect to carbon prices in the pilot regions of Shenzhen and Hubei, the results indicate that the proposed model has higher prediction accuracy and robustness. The main contributions of this paper are the improved feature extraction and the innovative combination of multiple linear regression and random forests into a non-linear integrated framework for carbon price forecasting. However, further optimisation is still a work in progress.


Introduction
The concentration of carbon dioxide in the atmosphere has risen rapidly as a result of industrialisation and the increase in all types of waste incineration.Emissions of carbon dioxide and other greenhouse gases constitute the main cause of the greenhouse effect [1].The increasing greenhouse effect has resulted in global warming, with serious negative impacts on the balance of ecosystems.In response to the challenge of climate change, countries have introduced carbon emissions trading markets.
The carbon market, as a key instrument used by governments to address energy transition and low-carbon development, has performed better over the past 20 years [2].In 2005, Europe established the EU Emissions Trading System (EU ETS), the first greenhouse gas emissions trading system in the world.To meet its carbon peaking and carbon neutrality targets, China has selected eight regions, including Shenzhen, Hubei, Beijing, Guangdong, and Tianjin, as pilot regions for the establishment of a carbon emissions trading market.Furthermore, in 2017, the National Development and Reform Commission (NDRC) formally announced that China would launch a pilot carbon trading market and gave the project a prominent place in the 13th Five-Year Plan, demonstrating the firm confidence of China in the development of a green economy [3].Through the marketisation of carbon allowances, governments are incentivising companies to switch to cleaner energy or less fossil fuel-intensive production to reduce carbon emissions [4].The carbon price, as a core indicator of the carbon market, is one of the most effective ways to encourage reductions in carbon emissions and limit the increase in the global average temperature [5].However, as an emerging market-based instrument, the carbon price is determined by a combination of internal market mechanisms and external influences.The volatility of the carbon price challenges the stability of the market and further affects the efficiency of emission reductions [6].The core issue of the carbon market is the formation and prediction of the carbon price.Accurately predicting the carbon price will help establish a carbon pricing mechanism, which will facilitate the pricing of other carbon financial products, such as carbon futures and carbon options, and will also be beneficial in providing practical guidance for production, operation, and investment decisions, ultimately achieving green, low-carbon, and high-quality development [7,8].
Due to the complexity of influencing factors, the carbon price tends to be characterised by non-linearity, non-stationarity, and high noise, posing major challenges to carbon price forecasting.Carbon price forecasting is inherently a time-series modelling task [9].Existing prediction models can be divided into three main categories: statistical models, artificial intelligence (AI) models, and hybrid models.Statistical models mainly include the autoregressive integrated moving average model (ARIMA) [10,11] and the autoregressive conditional heteroskedasticity model (GARCH) [12][13][14].Statistical models are based on certain economic theories and apply a combination of mathematical and statistical strategies to build models that capture the information embedded in the data.However, statistical models require complex feature engineering and are limited in dealing with non-linear, non-smooth, and non-Gaussian time series.In addition, they do not adequately capture the complex dynamic features in the data [15].Therefore, more flexible and accurate forecasting methods need to be introduced.Machine learning models predominantly consist of extreme learning machine (ELM) [16][17][18][19], random forest (RF) [3,20,21], and support vector machine (SVM) [22][23][24].Machine learning models have the advantage of being interpretable and transparent, but their ability to deal with non-linear and non-stationary time series is still inadequate [13].With the development of artificial intelligence technology and big data, the technical background for predicting carbon prices with deep learning models is maturing.Deep learning models primarily include artificial neural networks (ANNs) [25,26], convolutional neural networks (CNNs) [20,27,28], long short-term memory (LSTM) networks [28][29][30][31], and gated recurrent unit (GRU) networks [9,28,32].The above research explores the application of AI models in carbon price series forecasting, expanding the field of AI modelling research and achieving significant advancements.
However, given the high degree of uncertainty and non-linearity of carbon price series, a single model is no longer sufficient for accurate forecasting.In response, hybrid models have been studied by scholars to further explore the deeper relationships underlying irregular carbon price volatility.More specifically, hybrid models are typically a combination of signal decomposition strategies and the prediction algorithms described above.One of the most effective ways to reduce the complexity of carbon price series is to implement the decomposition-integration method.The first step is to decompose the original non-stationary time series into a number of relatively regular sub-models.Then, prediction models, including statistical and AI models, are applied to predict the single sub-models of the decomposition so that feature information at different scales can be extracted individually.Finally, the prediction results of each sub-model are reconstructed to obtain the prediction results [33].Currently, the major decomposition methods include empirical modal decomposition (EMD) and its variants [17,29,31,[34][35][36][37], wavelet transform (WT) [38][39][40], and variational modal decomposition (VMD) and its variants [33,[41][42][43].Although the above-mentioned decomposition methods have produced better prediction results, they also have limitations.For example, EMD suffers from modal aliasing and endpoint distortions; WT faces difficulties in choosing wavelet basis functions, high computational complexity of the discrete wavelet transform, and boundary effects; and VMD encounters problems such as difficulty in parameter selection and sensitivity to noise.Despite the limitations of the decomposition methods, all the hybrid models constructed based on the decomposition-integration method outperform single statistical or AI models.Therefore, an in-depth study of the application of decomposition methods in the field of carbon price forecasting is needed to better address the challenges of carbon price forecasting.Complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), as an efficient signal processing technique, reduces the reconstruction error by adding adaptive white noise to the original signal, where each mode is noise-enhanced on a randomly generated white noise background.The advantage of CEEMDAN is that it can effectively prevent modal aliasing and reduce the interference of white noise, thus improving the accuracy and stability of the signal decomposition.For this reason, CEEMDAN will be adopted as the sequence decomposition method in this paper.
It has been shown that AI models with feature extraction not only provide effective preprocessing of data but also have high computational efficiency, enabling the construction of suitable prediction models for time series.However, there are still a number of significant challenges.Firstly, the decomposition of carbon price series involves placing each subsequence into a forecasting model without taking into account the different complexities and correlations between the sub-sequences, which reduces forecasting efficiency and accuracy.Secondly, it is important to build models with more appropriate parameters separately, as the prediction model is the same for each sub-sequence, without considering that the sub-sequences differ due to their unique characteristics and frequencies.Thirdly, existing integration methods not only fail to focus on the intrinsic relationship between the original sequence and the reconstructed sequences but also are mainly limited to linear patterns, e.g., combining the predicted values of all reconstructed sequences to obtain the final prediction results.However, linear integration methods can affect the accuracy of predictions, as linear patterns are usually not applicable in all cases.Finally, error correction, which can significantly improve the accuracy of the model, is rarely considered in carbon price forecasting models.
To address the above barriers, a carbon price prediction model based on improved feature extraction and a non-linear integration method, named CEEMDAN-FuzzyEn-PSORF-ELM-LSTM-MLRRF-ARIMA, is proposed.Methodologically, it improves feature extraction and deep learning algorithms, develops an innovative form of non-linear integration based on MLRRF, and improves the accuracy of forecasting the non-smooth and non-linear carbon price.The first step in the method is the decomposition of the original carbon price series into a number of simple, smooth modes using CEEMDAN.Subsequently, simple modes with similar complexity are reorganised according to FuzzyEn, and feature extraction is carried out by combining CEEMDAN with FuzzyEn.This boosts computational efficiency, improves prediction accuracy, and reduces the complexity of the sequence.Then, PSORF, ELM, and LSTM are applied as prediction models for components of varying complexities to better capture the fluctuating characteristics, considering that different modes have unique frequencies and characteristics.Immediately following this, the initial integration of high-, medium-, and low-complexity sub-sequences is performed using MLR to further explore the relationship between the original sequence and each modality.Meanwhile, the aggregation of sub-sequences by MLR with non-linear integration learning further clarifies the relationship and improves the accuracy of aggregation, as non-linear integration learning can better adapt to non-linear data.RF is a typical non-linear bootstrap aggregating (bagging) integration learning method.It makes predictions by constructing combinations of multiple decision trees, each of which has a strong ability to generalise over the training data and serves to mitigate over-fitting during integration.Therefore, in this paper, the non-linear integration method based on multiple linear regression and random forest (MLRRF) is adopted to combine carbon price forecasting results.Finally, ARIMA is applied to correct the error, further boosting the accuracy of the forecast.
The innovations and contributions of this paper are as follows: (1) A novel prediction method that combines improved feature extraction, hybrid modelling, non-linear integrated learning, and error correction is introduced to provide highly accurate carbon price forecasts.The results demonstrate that the prediction method proposed in this paper remarkably improves carbon price prediction accuracy and has greater anti-interference ability and general applicability.(2) By considering the different complexities and correlations among the decomposition modes, an improved feature extraction method that combines CEEMDAN and FuzzyEn is implemented to efficiently screen out different features from the original carbon price sequence, which increases extraction efficiency and precision.(3) As different complexity components have their own characteristics and frequencies, PSORF, ELM, and LSTM are applied as prediction models for high-, medium-, and low-complexity sequences, respectively, better capturing the characteristics of each component.(4) Because non-linear integration has a smaller error and a wider range of applications than linear integration, RF is introduced as a non-linear learning method for nonlinear integration based on MLR, and, therefore, the MLRRF non-linear integration framework is innovatively established.( 5) Error correction is performed on the results of MLRRF integration to further explore the application of error correction in carbon price forecasting.
The remaining sections of this paper are structured as follows.Section 2 outlines the theoretical basis of relevant methods.Section 3 presents the decomposition and integration hybrid forecasting model.Section 4 applies the proposed model to Shenzhen and Hubei and discusses the calculation results.Section 5 presents the conclusions and discussion.

Completely Adaptive Ensemble Empirical Modal Decomposition of Noise (CEEMDAN)
EMD is an adaptive signal decomposition method proposed by Huang et al. [44] that does not require any assumptions about the data and can decompose complex nonlinear and non-smooth signals into a set of intrinsic modal functions (IMFs) and a residual.However, EMD suffers from modal aliasing and an excessive mode count.In response to the problems with EMD, Colorminas et al. [45] introduced complementary EMD.It decomposes the signal into forward and backward IMFs using two complementary EMD methods.Then, it determines the reliability of each IMF using adaptive noise estimation to select the most reliable IMF as a component of the signal.Finally, by inverse-transforming the selected IMFs, the original signal is reconstructed.The detailed process is given below.
Step 1: Standard normally distributed white noise w i (n) with different amplitudes is added to the given target signal x(n) to produce M different new series.The ith experimental signal sequence is constructed as follows: where γ 0 is the standard deviation of the noise.
Step 2: The first IMF C 1 (n) of the CEEMDAN decomposition is obtained by averaging the M modal components obtained from EMD.
where IMF is a function that satisfies the following two conditions: (1) the number of extrema equals the number of zero crossings with a tolerance of one and (2) the average of the envelope defined by the local maximum and the envelope defined by the local minimum is zero.The first residual r 1 (n) can be expressed as Step 3: Decompose the sequence r The second IMF can be expressed as where k denotes the number of IMFs and E k (n) is defined as the kth IMF obtained from EMD.
The second residual r 2 (n) can be represented as Step 4: Similarly, the kth residual r k (n) can be written as Step 5: Repeat step 4 until the remaining components cannot satisfy the EMD decomposition condition.Finally, all K IMFs of CEEMDAN are obtained, and the residuals are The sequence of targets is broken down into

Fuzzy Entropy (FuzzyEn)
FuzzyEn is a measure of entropy by taking into account the uncertainty and ambiguity in a sequence, evaluating the complexity and irregularity of a time series.Compared to traditional entropy measures, FuzzyEn is better at capturing the non-linear, irregular, and chaotic features in time series.The details of FuzzyEn are given below.
For a normalized time series with N sample points, the following sequence of vectors can be formed: x where X m i represents a sequence of m consecutive x, beginning at point t, and m is the embedding dimension.
The definition of the distance d m ij between vectors X m i and X m j is as follows: where X m i (k) and X m j (k) are the k elements of X m i and X m j , respectively.
Alternatively, d m ij can be estimated from the following equation: The fuzzy similarity S m ij between X m i and X m j is determined by a fuzzy affiliation function where r represents the tolerance and indicates the width of the curve.
To obtain the average value of fuzzy similarity D m r (i, r), average all fuzzy similarities between vector X m i and its neighbouring vectors X m j as follows: The equation for the fuzzy probability ∅ m (n, r) of two vector sequences matching for all m-dimensional points within tolerance r is as follows: By adding dimension m + 1, ∅ m+1 (n, r) is obtained: The probability that two vectors matching for points will continue to match another point is estimated as ∅ m+1 (n, r)/∅ m (n, r).FuzzyEn(m, n, r) is defined as the negative natural logarithm of the conditional fuzzy probability: The statistic FuzzyEn(m, r, N) measures the fuzzy entropy of a time series {x(i):1 ≤ i ≤ N} with finite length: PSO is a bionic intelligent computing method that simulates flock foraging behaviour.It takes advantage of the sharing mechanism of individual information in a flock of birds so that the movement of the whole flock generates an evolutionary process from disorder to order in the problem-solution space, thus obtaining the optimal solution.PSO has several advantages, including its simple concept, easy implementation, and a reduced number of parameters to adjust.The details of PSO are given below.
Step 1: Initialise the particle swarm.Determine the velocity interval (V min,d , V max,d ), search space (X min,d , X max,d ), initialised velocity the random particles in the D-dimensional space, where i ∈ [1, n] and n represents the number of particles in the swarm.
Step 2: Determine the individual extreme value.Calculate the fitness value F i of each particle, then compare the fitness value F i of each particle with the individual extreme value P i .If F i >P i , then replace P i with F i .
Step 3: Determine the global extremes.Compare each particle adaptation value F i with the global extreme value G.
Step 4: Update the particle velocity and position.Based on steps 2 and 3, the particle velocity and position are updated according to the following equations: where T is the inertia weight, which is non-negative, a 1 and a 2 are acceleration constants to regulate the maximum learning step, and r 1 and r 2 denote random numbers with values in the range (0, 1) to increase the randomness of the search.
Step 5: Determine if the algorithm is terminated.If the end conditions are met, end the algorithm and output the result; if not, return to step 2.

Random Forest (RF)
RF is an integrated learning algorithm first proposed by Breiman [46] in 2001.It mainly extracts multiple samples from the original data using the bootstrap resampling method and constructs a classification tree for each of the samples.Finally, the predictions of the classification trees are used to select the final result by group voting.The specific modelling steps of random forest regression are described below.
Suppose the input data are represented by F (n samples, m features, 1 label), and a random forest containing h trees is constructed.
(1) Construct the sample set.Perform h rounds of random sampling with replacement on the original sample set using the bootstrap method to obtain h subsample sets.For h sub-sample sets, each with randomly selected features, h subsample sets are constructed that may contain different features.
(2) Training.Train h regression trees using the h subsample sets.The node partition of the regression tree adopts the minimum mean square deviation.For each partition feature T corresponding to the partition point s on either side of the partition into the left dataset D l and the right dataset D r , the expression is min where γ i is the label corresponding to x i , c 1 is the average of all labels in D l , and c 2 is the average of all labels in D r .
(3) Prediction.A random forest is constructed with h regression trees, and the predicted value of ŷ for x i by the random forest is where k i (x) represents the ith regression tree prediction.
The prediction accuracy and speed of RF are strongly influenced by parameters such as the number of decision trees m and the maximum depth of the decision trees d.If m is too low, the model tends to be underfitted, whereas if it is too high, it can neither significantly enhance the model nor increase the computational time of the model.Similarly, with increasing d, the computational complexity of the model rises as the level of fit improves.Therefore, it is vital to set the appropriate m and d.In this paper, we utilise PSO to optimise the parameters in RF and establish a PSORF prediction model.The flowchart of the model is shown in Figure 1.

Extreme Learning Machine (ELM)
ELM is a fast and efficient machine learning algorithm that maps input data into a high-dimensional feature space by randomly generating hidden layer weights and biases and then computing the output layer weights using an analytical solution.The network structure of ELM is shown in Figure 2, and the details of ELM are given below.For N arbitrary distinct samples (x j , t j ), where a single hidden-layer neural network containing L hidden nodes can be represented as where g(•) is the activation function, T is the weight vector connecting the ith hidden node and the input nodes, b i is the threshold of the ith hidden node, and T is the weight vector connecting the ith hidden node and the output nodes.
The aim of ELM is to minimise the error between the input vector t j and the output vector o j , which can be expressed as That is, there exist corresponding w i , x j , and b i such that which can be expressed as a matrix where H is the output matrix of the implicit layer node, β is the output weight, and T is the desired output matrix.

Long Short-Term Memory (LSTM)
LSTM is a kind of recurrent neural network (RNN) that is effective at processing and predicting time series.LSTM alleviates the two major problems of gradient vanishing and gradient explosion in RNNs, making it more suitable for long-series prediction.The structure of LSTM is shown in Figure 3. LSTM transmits information from c t−1 to c t , and the specific computation is divided into the following three gate structures: (1) Forget gate: State information is screened and then selectively discarded using the following formula: where b f is the forget gate; σ(•) is the activation function; W f is the weight matrix of the forget gate; h t−1 and x t denote the output and input matrices of the state unit at moments t − 1 and t, respectively; and b f is the threshold of the forget gate.
(2) Input gate: The input of new information is determined, including the update of the information and the content of the alternative update, which is calculated by the following formulas: where i t and Ct are the input gate and input node, W i and W c are the weight matrices of the input gate and input node, b i and b c are the thresholds of the input gate and input node, tanh is the activation function, and C t and C t−1 denote the states at moments t and t − 1, respectively.
(3) Output gate: The output is first determined through the sigmoid layer, and then the output is multiplied with the result after tanh processing to obtain the specific formula, as follows: where O t and h t are the output gate and intermediate output, respectively; W o is the weight matrix of the output gate; and b o is the threshold of the output gate.

The Framework of the Proposed Model
To address the non-stationary and non-linear characteristics of the carbon price, a new hybrid model, CEEMDAN-FuzzyEn-PSORF-ELM-LSTM-MLRRF-ARIMA model, is proposed.The model combines four core components, namely improved feature extraction, a hybrid prediction model, non-linear integration, and error correction.The framework of the proposed model is shown in Figure 4.
In the first part, the carbon price series is decomposed using CEEMDAN to extract several IMFs with smooth volatility and a residual.Subsequently, to reduce computational costs and improve prediction accuracy, components with similar complexity are reconstructed into three sub-sequences based on FuzzyEn.In the second part, PSORF, ELM, and LSTM are selected in the crucial prediction stage to capture the unique features of each component and to construct an appropriate model for each complexity component.In the third part, the predicted outcomes of the hybrid model are integrated utilising MLRRF to obtain the non-linear integration results.In the fourth part, the non-linear integration results are corrected for errors by applying ARIMA to obtain the final forecasting results.

Parameter Setting
In this paper, RF, ELM, and LSTM are selected as prediction models.In the three models, RF has two hyperparameters, which are the number of decision trees and the maximum depth.Both ELM and LSTM are single hidden-layer structures with 64 units, and the optimiser utilised is the adaptive moment estimation (Adam).The difference is that the activation function of ELM is a Rectified Linear Unit (ReLU), while that of LSTM is Sigmoid.The hyperparameter settings of model training are shown in Table 1.

Comparative Models
To illustrate the necessity and superiority of the proposed model, the model comparison is divided into four parts: (1) Validate the need for sequence reconstruction.Here, the performance of the original carbon price sequence is compared with that of the reconstructed sequence based on FuzzyEn to highlight the importance of sequence reconstruction.(2) Verify the demand for hybrid models.Evaluation metrics for single and hybrid prediction models compare high-, medium-, and low-complexity sequences to illustrate the ability of hybrid models to handle diverse data.(3) Confirm the necessity of MLRRF integration.This is done by comparing the results obtained from simple summation integration with those subjected to MLRRF nonlinear integration to emphasise the requirement for non-linear integration in the reconstruction of extremely complex and non-stationary sequences.(4) Verify the need for error correction.The results without error correction are compared with the results of introducing an ARIMA model with error correction to highlight the criticality of error correction on the prediction results.
It is hoped that the above comparisons can more clearly demonstrate the necessity and superiority of the proposed hybrid model in carbon price forecasting.

Model Accuracy Evaluation
To evaluate the predictive effectiveness of the model, the following five indicators are applied in this paper: mean square error (MSE), mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R 2 ).
where ŷ represents the predicted value, y denotes the original data, ȳ signifies the mean value of the original data, and n means the length of the test set data.

Model Significance Evaluation
The Diebold-Mariano test (DM test) was proposed by Diebold and Mariano [47] in 1995.The DM test is used to determine whether the prediction results of different models are statistically significant.Its main ideas are described below.
Suppose there is a true time series y t = {y 1 , y 2 , • • • , y t } with t sample points, and the forecasts of the true series based on models A and B are ŷ(a) t }, respectively.The prediction errors of the two models are e (a) i e MAE is chosen as the loss function F(e i ), where j = a, b, and can be expressed as The statistic of the DM test is expressed as where i ) and s is the standard deviation of d i .The null hypothesis of the DM test is shown in Equation (42), which indicates that the two models have the same predictive performance, whereas the alternative hypothesis of the DM test, as given in Equation ( 43), states that the two models have different predictive performances Since the DM test statistic asymptotically follows a standard normal distribution, the absolute value of the DM statistic can be compared to z α/2 .If the DM statistic satisfies Equation ( 44), the null hypothesis that there is a significant difference between the predictions of the different models will be rejected.Conversely, the null hypothesis that there is no significant difference between the two models will be accepted |DM| > z α/2 (44) where z α/2 denotes the z-value in the table of the standard normal distribution at a level of significance of α.

Empirical Study
Accurate carbon price forecasts can effectively reduce carbon price volatility and thus facilitate the steady development of the carbon market.For example, if the carbon price is expected to rise sharply, management can reduce the volatility of the carbon price by increasing the number of allowances.In addition, a smooth carbon price also helps companies plan their resources, manage carbon market risks, and control their costs over the long term, rather than having to react to drastic changes in the carbon price.Therefore, it is important to develop highly accurate carbon price forecasting models.

Descriptive Statistics
Due to the Shenzhen and Hubei markets having larger sample sizes compared to other markets, this paper chooses these two markets as the research samples, which can advantageously support thorough training and guarantee the precision and generalisability of the model.Data on carbon prices for both the Shenzhen emission allowance (SZEA) and the Hubei emission allowance (HBEA) markets were obtained from iFinD (https: //ft.10jqka.com.cn/,accessed on 16 January 2024).The time span of the SZEA is from 5 August 2013 to 8 January 2024, with a sample size of 2201.The time span of the HBEA is from 2 April 2014 to 15 January 2024, with a sample size of 2267.The main data characteristics of the study sample are detailed in Table 2. Prior to the model forecast, the carbon price series is expanded using the time-lag method, and the flowchart of the expansion is shown in Figure 5.In this paper, the time step is set to 28, and the prediction type is a one-step-forward prediction, which means that the first 28 observations are used to predict the 29th observation.From the daily carbon price trend graph in Figure 6a,c and the frequency distribution histogram in Figure 6b,d, it is clear that the carbon price is non-linear and highly complex.Therefore, accurate prediction is not feasible directly, so decomposition must be performed before further processing.

Validation of the Necessity of Sequence Reconstruction
In order to understand the complexity of the IMFs generated by CEEMDAN more intuitively and accurately, we adopt FuzzyEn to describe their complexity.The original series of the SZEA and HBEA are decomposed using CEEMDAN, as depicted in Figure 7.The results in Figure 7 suggest that both the SZEA and HBEA are decomposed into nine IMFs, and the degree of volatility of the decomposed series exhibits a decreasing trend.As mentioned above, FuzzyEn can measure the complexity of a time series.To improve prediction accuracy and computational efficiency, sub-sequences with similar FuzzyEn results are reorganised according to the fuzzy entropy theory.Comparisons of the FuzzyEn results of the original and reconstructed sequences are shown in Table 3.
According to the results in Table 3, the FuzzyEn results of the SZEA and HBEA are 0.8341 and 0.7463, respectively, so the SZEA has a higher degree of complexity than the HBEA.The high-complexity components are sorted according to the FuzzyEn results of the original carbon price series.For the SZEA, IMF1 (1.3239) and IMF2 (1.0932) are classified as high-complexity components.For the HBEA, IMF1 (0.7463), IMF2 (0.5486), and IMF3 (0.3294) are categorised as high-complexity components.The remaining components are ranked as medium-and low-complexity components based on the FuzzyEn results.For the SZEA, IMF3 (0.5987), IMF4 (0.3055), and IMF5 (0.1209) are considered medium-complexity components and reconstructed as Rec-sub3.IMF6 (0.0359), IMF7 (0.0053), IMF8 (0.0021), and IMF9 (0.0006) are divided into low-complexity components and reconstructed as Rec-sub4.For the HBEA, IMF4 (0.1296), IMF5 (0.0548), and IMF6 (0.0178) are considered medium-complexity components and reconstructed as Rec-sub4 (0.0917).IMF7 (0.0032), IMF8 (0.0015), and IMF9 (0.0001) are recomposed as low-complexity components and reconstructed as Rec-sub5 (0.0013).After the reconstruction of the sequences, the mean value of the FuzzyEn results was lowered from 0.8341 to 0.3486 for the SZEA and from 0.2305 to 0.1831 for the HBEA.Therefore, the complexity of the sequence is significantly decreased after reconstruction, which offers a sound basis for improving the accuracy of carbon price prediction.Comparisons of the FuzzyEn results of the original and reconstructed sequences are displayed in Figure 8.The trends of the components of the SZEA and HBEA after reconstruction are illustrated in Figure 9.It is clear that the complexity of the reconstructed sequence is significantly lower than that of the original sequence, making it easier for the model to capture the movement patterns of the series.However, a reduction in complexity does not necessarily improve prediction accuracy.Thus, to verify whether the reduction in complexity can improve prediction accuracy, the evaluation metrics of the original and reconstructed sequences are compared using three models, and the comparison results are shown in Table 4. Table 4 demonstrates that for both the SZEA and HBEA across the three models, the MSE, MAE, RMSE, and MAPE all decrease after reconstruction.This indicates that sequence reconstruction increases prediction accuracy while effectively reducing the complexity of the sequence.

Validation of the Need for Hybrid Models
PSORF, ELM, and LSTM are applied to predict the reconstructed sequences.To reduce computational cost, an early stopping mechanism is added to PSO.If the model's iteration accuracy does not improve after 25 rounds, the optimisation algorithm is stopped to retain the optimal result of this iteration round.The parameter settings of PSO are listed in Table 5.The convergence flow of PSORF training for the HBEA is illustrated in Figure 10.The MAE values of the three models for the SZEA and HBEA are shown in Table 6.Table 6 presents the performance of PSORF, ELM, and LSTM in predicting sequences of different complexity.Overall, PSORF performs best on high-complexity sequences (SZEA-Rec-sub1, SZEA-Rec-sub2, HBEA-Rec-sub1, HBEA-Rec-sub2, and HBEA-Rec-sub3), with the exception of HBEA-Rec-sub3, but does not perform well on low-complexity sequences (SZEA-Rec-sub4, HBEA-Rec-sub4, and HBEA-Rec-sub5).ELM predicts better on sequences of medium complexity (SZEA-Rec-sub3 and HBEA-Rec-sub4).LSTM is more suitable for predicting low-complexity sequences such as SZEA-Rec4 and HBEA-Rec5, for which the MAE values of PSORF and ELM are 2.9656, 0.2642, 1.9009, and 0.1182, respectively.Therefore, PSORF, ELM, and LSTM are selected for high-, medium-, and low-complexity sequence predictions, respectively, and the prediction results are summed to obtain the hybrid model's predicted results.The error comparison results for the single and hybrid models are presented in Table 7.As shown in Table 7, the evaluation indicators of both the SZEA and HBEA hybrid models are smaller than those of the single prediction model.Specifically, the MSE, MAE, RMSE, and MAPE of the SZEA are 2.5579, 1.0693, 1.5994, and 0.0439, respectively.For the HBEA, these values are 0.2294, 0.3635, 0.4790, and 0.0079, respectively.This suggests that the hybrid model has a greater advantage in terms of forecasting error, which enables better fitting and prediction of highly volatile time series.

Validation of the Necessity of MLRRF Integration
This paper examines the intrinsic link between the original carbon price series and the reconstructed carbon price series and applies MLR to quantify the relationship between these series, followed by non-linear integration with RF.The summarised projections are presented in Table 8.For the SZEA, using MLRRF integration, the MSE, MAE, MAPE, and RMSE are 1.8411, 0.8675, 1.3569, and 0.0326, respectively, which represent significant reductions of 45.91%, 25.94%, 26.46%, and 34.23%.The results for the HBEA are similar and are not presented here.Hence, the prediction error of MLRRF integration is substantially reduced, making it more revealing of the relationship between the original and reconstructed sequences.A comparison of the error results for simple additive integration and MLRRF integration is displayed in Figure 11.

Verification of the Need for Error Correction
To further explore the application of error correction in carbon price forecasting, this paper uses ARIMA for error correction, and the confirmation criterion employed for the parameters of ARIMA is the Bayesian Information Criterion (BIC).According to the BIC, the parameters for both ARIMA models are set to (1, 1, 0).Error correction is performed using ARIMA for the non-linear integration results, and the error-corrected evaluation indicators are reported in Table 8. , with an R 2 value of 0.9921.These outcomes show that error correction can be effective in improving the prediction accuracy of the model.For simplicity, each model is referred to as M1, M2, M3, etc., as shown in Table 9.The results of the overall model comparison between the SZEA and HBEA are illustrated in Figure 12, and the fitting results of each model are presented in Figure 13.As can be seen in Figures 12 and 13, for all indicators and fitting levels, the M9 model is optimal for both regions.

Validation of the Proposed Model
In order to validate the proposed model, the results presented in this paper are compared with those of similar studies derived from excellent research about carbon price foresting, as shown in Table 10.From the table, it is clear that the proposed model is superior to the other models.Here, the increase in prediction accuracy is due to the improvements in feature extraction and non-linear integration.
To avoid the erroneous conclusion that the proposed model has a high degree of predictive accuracy due to random error, the model is statistically evaluated by applying the DM test, where the MAE is used as the loss function.The DM test results of each model are detailed in Table 11.It can be seen that the p-value of each comparison model is less than 0.05, indicating that these models pass the DM test.Therefore, the models proposed in this paper are robust and can maintain good prediction performance in different scenarios.

Conclusions
This paper establishes a forecasting framework based on improved feature extraction and non-linear integration to predict carbon prices.The contributions of this paper can be summarised as follows: (1) A feature extraction method that combines CEEMDAN and FuzzyEn is proposed, which can effectively extract features from the original carbon price series while optimising computational efficiency.The method makes the feature extraction process both accurate and efficient, promising to deliver more reliable predictions.(2) Based on the characteristics and complexity of the reconstructed sequences, the components of each complexity are predicted using targeted models.This design helps improve prediction quality because components of different complexity need to capture various characteristics, and the targeted model can better adapt to these characteristics to achieve higher precision.
(3) This paper uses the non-linear integration learning method of MLRRF to reconstruct the sub-sequences, which can better reveal the relationship between the original and reconstructed sequences while effectively reducing prediction error.(4) By introducing error correction, the performance of the model achieves significant improvements in all evaluation indicators.Therefore, in practical applications, error correction has great potential to contribute to the improvement of the predictive performance of the models.
Carbon price forecasts can validly reflect the operating rules of the carbon market and provide a reference for the development of operational programmes and investor decision-making.Meanwhile, the results of the empirical analyses of the SZEA and HBEA also provide a theoretical basis for carbon pricing.Based on the results of the carbon price forecast, appropriate market operation and management strategies can be formulated to guide the investment direction of the carbon market.

Discussion
Focusing on the non-smooth and non-linear characteristics of the carbon price, this paper proposes a novel hybrid prediction model named CEEMDAN-FuzzyEn-PSORF-ELM-LSTM-MLRRF-ARIMA.It combines four core components: improved feature extraction, hybrid models, non-linear integration, and error correction.The experimental results show that the hybrid model is superior to traditional prediction methods in terms of prediction accuracy and robustness.
However, there are some limitations to this study.For example, it relies only on data-driven modelling and does not take into account external influences such as energy prices, investor sentiment, and climate change.In future work, effectively addressing the above shortcomings will contribute to improving the accuracy of non-stationary, non-linear time-series forecasting.

Figure 2 .
Figure 2. Topology of an extreme learning machine.

Figure 4 .
Figure 4. Framework of the proposed model.

Figure 5 .
Figure 5.The expansion flowchart.Before prediction, each dataset is divided into a training set and a test set.The training set is used to train the model, and the test set is used to validate the model's performance.In this paper, the number of samples in the training and test sets is 80% and 20%, respectively, of the total sample size.The trends of the SZEA and HBEA, as well as the frequency distributions of the average transaction price, are shown in Figure 6.

Figure 6 .
Figure 6.Trends and frequency distributions for two regions.(a) Trend of SZEA; (b) Frequency distribution of SZEA; (c) Trend of HBEA; (d) Frequency distribution of HBEA.

Figure 9 .
Figure 9. Results of the reconstruction of the two regions.(a) Results of the reconstruction of SZEA; (b) Results of the reconstruction of HBEA.

Figure 11 .
Figure 11.Results of the MLRRF integration error comparison.

Table 1 .
Parameters of forecasting models.

Table 2 .
The main numerical characteristics of the research data.

Table 3 .
The FuzzyEn results of each sub-sequence and reconstruction consequence.

Table 4 .
Error comparison between original and reconstructed sequences.

Table 5 .
Parameter setting of PSO.

Table 6 .
The MAE values for the three models for the two regions.

Table 7 .
Results of error comparison between single and hybrid models.

Table 9 .
Code name of each model.

Table 10 .
Comparison results with similar studies.

Table 11 .
DM test results of each model.