1. Introduction
Addressing climate change is a crucial issue concerning the survival and development of humanity. Actions to reduce greenhouse gas emissions have become a focus for all countries. With total control over the market for carbon emission allowances, enterprises can profit by trading surplus carbon emission allowances as commodities through carbon emission reduction, or by purchasing allowances to offset excess carbon emissions. Meanwhile, this mechanism can promote the low-carbon transformation of enterprises and realize the effective control of carbon emissions. The carbon price is a core element of the carbon market, directly showing the scarcity of carbon resources and enterprises’ emission–reduction costs. Also, it is an important basis for an enterprise to make production decisions and to allocate and trade carbon resources. At the same time, it acts as a key reference for the government to evaluate the operation of the carbon market and flexibly adjust carbon-emission reduction policies.
Since the launch of pilot carbon emission trading in China in 2011, a unified national carbon emission trading market has formed, becoming an important measure to advance the “dual carbon” goals. These pilot policies have driven regional emission reductions [
1], strengthened enterprise performance in environmental and social responsibilities and self-governance [
2], and accelerated technological innovation and green development [
3]. However, the carbon price is affected by multiple complex factors, such as quota policies, industrial production, energy prices, and macroeconomic factors, leading to complex patterns of non-stationarity, high noise, and nonlinear characteristics. Consequently, developing accurate price prediction models for carbon prices is worth researching.
There has been some relevant research on predicting the price of carbon emission trading. One is to apply traditional time-series regression methods to predict the price of carbon emission trading. Some studies have established combined MIDAS regression models for prediction, but these require manual determination of feature correlations between sequences and the setting of relevant weights and parameters in the experiment, and it is difficult to accurately capture the extreme fluctuations that may occur in carbon price sequences. Some other studies constructed hybrid ARIMA-based regression models for prediction, but parameter selection relies heavily on manual selection and empirical screening. Meanwhile, processes such as differencing used in the ARIMA method may cause the loss of key information in the original sequence, which could affect prediction results.
Another type of research uses AI methods, such as machine learning, for prediction. Some studies have constructed CNN-LSTM models to predict, but the sequences themselves usually exhibit a mix of linear and nonlinear features with high noise, making it difficult for models to capture features completely. What is more, the performance on test sets is quite degraded compared with that on training sets. The complex characteristics of the sequence actually arise from the superposition of signals with different frequencies, and each signal has its own trend, period, and noise information, which needs targeted decomposition and fitting to accurately extract the implied laws.
The ICEEMDAN method can automatically decompose data into several subsequences that are relatively stationary, based on the characteristics of signal sequence, controlling the influence of noise while retaining the information across different dimensions. The CNN-LSTM method can effectively extract local features from each sequence and capture long short-term dependencies within it. Consequently, this research proposes a hybrid prediction model integrating the ICEEMDAN decomposition method and a CNN-LSTM fitting method. It decomposes and denoises the original sequence of carbon prices, then extracts features and fits to improve prediction accuracy.
This research proceeds from two core expectations. First, integrating ICEEMDAN signal decomposition into a neural network pipeline should improve the accuracy of prediction by producing more reliable IMF components for subsequent modeling. Second, the combined CNN-LSTM method should better extract local features and dependencies across the components.
The core work of this research focuses on the proposed ICEEMDAN-CNN-LSTM prediction model. Firstly, to address the pain points in predicting carbon prices, a prediction framework comprising decomposition, fitting, and recombination is designed by leveraging the advantages of ICEEMDAN for signal decomposition and the feature-extraction capabilities of CNN-LSTM. Secondly, the closing price of the Hubei carbon emission trading market is selected as the research object, and three indicators including the carbon emission futures [
4], S&P 500 index [
5] and US dollar index [
6] are selected as influencing factors, which are used as input data after preprocessing like interpolation and normalization. Thirdly, by applying the ICEEMDAN-CNN-LSTM model, prediction results are obtained and then compared with those of benchmark models CNN and LSTM. In addition, this research picks four evaluation indicators, including MAE, MSE, RMSE, and MAPE, then comprehensively verifies the superiority of the model through robustness tests under different division ratios of training and test sets.
From the experimental results, compared with two benchmark models, the proposed model reduces MAE, MSE, RMSE, and MAPE by more than 47%, 78%, 53%, and 42%, respectively, in the training set, and by more than 59%, 81%, 57%, and 57%, respectively, in the test set. This shows that the constructed ICEEMDAN-CNN-LSTM model can effectively handle the frequency, trend, and noise in carbon prices and significantly improve feature capture, resulting in higher prediction accuracy.
The rest of the paper consists of four sections.
Section 2 will systematically analyze research progress on predicting the price of carbon emission trading through a literature review. Then,
Section 3 will introduce the framework of the model and research methods adopted in this study in detail. After that,
Section 4 will present an empirical analysis, introducing basic information on the selected research data, and present the prediction results and indicator performance of the proposed model. Finally,
Section 5 will propose research conclusions and prospects.
2. Literature Review
In the early years, the prediction of carbon prices was mostly based on traditional methods for electricity price prediction, such as regression and econometric models. For example, Contreras et al. [
7] used the ARIMA model to predict next-day electricity prices in Spain and other regions. Zareipour [
8] et al. used transfer function and dynamic regression models, incorporating relevant market data to improve the accuracy of energy price predictions in Ontario, Canada. Similarly, some studies treat carbon prices as time series and use traditional regression methods to predict them [
9]. Byun & Cho [
10] used GARCH-family models, implied volatility (IV), and other methods for prediction. However, such traditional methods usually require linear and stationary sequences, while the actual carbon prices are affected by multiple factors such as policies, markets and energy prices, often showing significant nonlinear, non-stationary, and high noise characteristics, making it difficult for traditional models to fully explore effective information in the sequences [
11].
With the development of artificial intelligence, machine learning and deep learning methods have been increasingly applied to carbon price prediction in recent years, thanks to their strong nonlinear fitting and feature-learning capabilities. Compared with single models, hybrid models that integrate multiple methods usually achieve better predictive performance. For example, Dong et al. [
12] designed an Lp-CNN-LSTM model to effectively deal with high- and low-frequency components of carbon prices. Wang & Zhuang [
13] combined XGBoost with BiLSTM and BiGRU methods to improve prediction accuracy. In addition, some studies focus on selecting influencing factors. For example, Yao et al. [
14] used the Pearson correlation coefficient method to screen variables and constructed a BP-LSTM hybrid model. Wei et al. [
15] selected multi-level indicators, such as the CSI 300 index, to establish a Transformer-LSTM model. Wang & He [
4] adopted the APVMD-LightGBM-TCN method to construct a complete framework for feature extraction, factor screening, and prediction. The above studies provide useful references for feature selection and model construction, but still have problems, such as their rough depiction of extreme situations and rise-and-fall trends, and low representativeness of selected features.
Considering the non-stationary and high-noise characteristics of carbon prices, introducing signal decomposition methods into the prediction model can decompose the original sequence into several relatively stationary subsequences, effectively reducing noise interference and capturing hidden information, and thus improving prediction accuracy. For example, He et al. [
16] used the VMD-SWD secondary decomposition method to process the carbon price sequence to extract effective information. Nadirgil [
17] used a CEEMDAN-VMD dual decomposition combined with a neural network for prediction, thereby verifying the decomposition method’s effectiveness. Wang et al. [
18] constructed a CEEMDAN-SE-LATM-RF model and made innovations in feature extraction. Duan et al. [
5] used the CEEMDAN method for multiple decompositions, thereby further improving prediction accuracy. The experimental results of the above studies show that signal decomposition methods can considerably improve the prediction model’s ability to capture trends and details of the sequence.
Some studies focus on feature selection and model parameter optimization. For feature selection, Li & Ren [
6] effectively improved prediction effect of the model through feature selection algorithms, while Liu et al. [
19] screened out key factors through regularization models. Wei & Ouyang [
20] used the s-PCA method to reduce dimensions of influencing factors to improve prediction accuracy. For model parameter optimization, Shi et al. [
21] constructed a CNN-LSTM model and selected the optimal parameter combination. Yu & Shi [
22] introduced an attention mechanism based on CNN-LSTM to enable the model to focus on key features. However, existing methods still rely heavily on personal experience in parameter setting; in addition, there is still room for improvement in the processing of noise interference.
In addition, some studies focus on the explainability and comparative analysis of models. For model explainability, Sayed et al. [
23] tried to use interpretable artificial intelligence technology to analyze prediction results. For model comparative analysis, Hong et al. [
24] evaluated the performance of various classification methods for predicting carbon price trends. Kumar et al. [
25] systematically compared the predictive performance of traditional time-series models and machine learning models. The above studies provide a diversified perspective for model construction and evaluation. Based on existing research, it could be concluded that various hybrid models have made certain progress in carbon price prediction, but there are still deficiencies, including insufficient processing of high noise and non-stationary features in the sequence, excessive dependence on individual experience in the setting of complex model parameters, and room for improvement in feature extraction and data fitting.
Among the various signal decomposition methods used in carbon price prediction, ICEEMDAN offers several unique advantages that distinguish it from alternative methods. Unlike traditional EMD or EEMD, which suffer from model aliasing or residual noise, ICEEMDAN employs an adaptive noise control mechanism that dynamically adjusts the noise intensity based on the local characteristics of the residual signal. The dynamic noise elimination strategy is particularly well-suited to the carbon price series, which exhibits time-varying volatility due to periodic policy adjustments and external shocks. Compared to VMD, which requires pre-specification of the number of modes and a penalty factor, ICEEMDAN is fully data-driven and can autonomously determine the appropriate decomposition level. These features make ICEEMDAN a more robust and adaptive decomposition tool for non-stationary and nonlinear carbon price sequences, providing higher-quality IMF components for subsequent CNN-LSTM modeling.
In view of these, an ICEEMDAN-CNN-LSTM hybrid prediction model is proposed with the following core advantages. Firstly, the ICEEMDAN method adaptively decomposes the original sequence, efficiently suppressing model aliasing, and determines the number of decomposition modes based on the sequence’s characteristics, thereby avoiding deviations caused by manual setting and improving the stability and reliability of subsequent feature extraction. Secondly, the CNN method can effectively capture local key features of subsequences, and the LSTM method can model long-term dependencies among them. By integrating CNN and LSTM methods, the model’s ability to capture the complicated dynamics of sequences can be significantly improved. Finally, by applying the ICEEMDAN method, the model can decompose into multi-scale and reduce the noise interference of the original sequence, enabling the CNN-LSTM method to focus on subsequences with clearer features, which can effectively solve the pain points of insufficient noise suppression and incomplete feature capture of existing prediction methods, reduce prediction errors, and have stronger robustness.
3. Model Construction
3.1. Methods
Carbon emission trading prices exhibit significant non-stationarity, nonlinearity, and high noise, making it difficult for a single model to simultaneously filter noise, extract multi-scale features, and capture long short-term dependencies. Based on the core logic of “Decomposition–Extraction–Fitting”, a hybrid model framework of “ICEEMDAN signal decomposition–CNN extraction–LSTM time-series fitting” is used to predict carbon prices. The ICEEMDAN method can adaptively decompose the original sequence, effectively handle noise in carbon prices, and yield relatively stationary subsequences. The CNN method can extract local key features from nonlinear sequences and effectively mine their hidden information. The LSTM method can capture long-term dependencies in sequences and effectively model time-series features. In the organic combination of the above methods, the model is able to reduce noise, capture features and fit time series, and finally realize the accurate prediction of carbon prices. The model’s framework is shown in
Figure 1, and a detailed introduction to each method adopted in the model is provided below.
3.1.1. ICEEMDAN
The Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN) [
26] is a signal decomposition method based on Empirical Mode Decomposition (EMD) and Ensemble Empirical Mode Decomposition (EEMD). By introducing adaptive noise and complete ensemble methods, it can effectively suppress model aliasing that may occur in the signal decomposition process and control noise to a certain extent, thereby obtaining more reliable decomposition results. ICEEMDAN is further optimized on the basis of the CEEMDAN method, which can automatically determine the most appropriate number of decomposition modes according to characteristics of the input signal itself, rather than relying on manual settings, which may cause errors. At the same time, by constructing multiple groups of white noise and performing ensemble averaging, ICEEMDAN suppresses noise interference in the original sequence, effectively preserving signal information, and improves the stability of decomposition results. Its main steps are as follows:
Firstly, generate M groups of independent Gaussian white noise n(m)(t) with mean 0 and variance 1, and set the initial residual component r0(t) as the original signal x(t).
Then, add noise to the original signal to obtain a series of noisy signals
r0(
t) +
ε0n(m)(
t), and decompose them using the EMD method to obtain the first intrinsic mode component IMF
1(m)(
t). Perform ensemble averaging on M groups of IMFs to obtain the first IMF of the ICEEMDAN method, namely
and calculate the first residual component
r1(
t) =
r0(
t) − IMF
1(
t).
Next, calculate the standard deviation of the first residual component
r1(
t):
and adaptively determine the corresponding noise intensity
ε1 =
ε0 ·
σ1, construct a noisy residual signal
r1(
t) =
ε1n(m)(
t), then use the EMD method to decompose it to obtain the second intrinsic mode component IMF
2(m)(
t). Obtain IMF
2(m)(
t) by ensemble averaging, and calculate the second residual component
r2(
t).
After that, repeat the above steps until the residual component is monotonic or has only one extreme point, and the decomposition terminates. At this time,
and obtain the final residual component
rk(
t).
Finally, the original signal can be reconstructed from all IMFs and the final residual component
rk(
t), namely
3.1.2. CNN
The Convolutional Neural Network (CNN) is a classic method in deep learning. It was initially used in the fields of signal processing, image recognition and classification, and later extended to other tasks such as feature extraction. Through the local receptive field and weight-sharing mechanism, CNN can efficiently capture local correlation features of data. It generally includes an input layer, convolutional layers, pooling layers and fully connected layers. The input layer receives the original data and converts it into a feature matrix with specified batch size, step size, and feature dimension. The convolutional layer performs cross-correlation operations on the feature matrix by sliding convolution kernels to extract local key features. The pooling layer adopts a specific pooling strategy to compress feature dimensions, retaining core features while reducing computation and improving the generalization ability of the model. The fully connected layer maps pooled feature vectors to the output space. The CNN process is shown in
Figure 2.
Firstly, capture local features of data through the convolution kernel:
where
X is the input feature map,
W is the convolution kernel,
b is the bias, and “
” represents a cross-correlation operation, that is, taking the convolution kernel as a sliding window, and multiplying and summing corresponding bits on the input feature map.
k is the side length of the convolution kernel,
cin is the number of channels of the input feature map, and
cout is the number of new channels to be detected, that is, the number of channels of the output feature map.
Then, compress the data appropriately to retain data features via a pooling operation, and reduce the output dimension to reduce computation. For max pooling:
where
P is the output feature map,
i,
j are row and column coordinates of the output feature map, respectively,
c is the channel index,
X is the input feature map,
k is the side length of the pooling window,
u,
v are local offsets in the window, respectively, and
s is the step size.
Finally, integrate the data features through fully connected layers and output the final result:
where
z is the output vector,
W is the weight matrix,
x is the input vector, and
b is the bias.
3.1.3. LSTM Network
The Long Short-Term Memory (LSTM) Network is a neural network architecture based on the Recurrent Neural Network (RNN). Using three gate control structures—input gate, forget gate, and output gate—with cell state, it solves the problems of gradient explosion and gradient disappearance that occur in traditional RNNs during training. The cell state can store time-series information for longer, enabling better capture of long short-term dependency relationships. The gate control structure can flexibly adjust the forgetting and updating of information to screen out core time-series features.
As shown in
Figure 3, an LSTM structure usually consists of an input layer, one or more hidden layers, and an output layer. Data enter the model through the input layer and are converted in the hidden layer to extract information. Finally, the output layer would generate the processed result. An LSTM structure can have one or more hidden layers, depending on specific task requirements, to effectively extract information from the data.
The hidden layer of the LSTM consists of several hidden units, and the internal structure of a single hidden unit is shown in
Figure 4. In addition to receiving input at the current time step, the hidden unit also receives the output from the previous time step, and finally obtains the output at the current time step after processing through the input gate, forget gate, output gate, and other structures.
Let the current time step be t, input be xt, and hidden state at the previous time step be ht−1, with cell state Ct−1. In the hidden unit of LSTM:
Firstly, the forget gate determines information to be retained and discarded in cell state
Ct−1 at the previous time step:
where
σ is the Sigmoid activation function,
Wf is the weight matrix, and
bf is the bias.
Then, the input gate determines new information to be stored at the current time step and generates candidate cell states:
where
Wi and
WC are weight matrices, and
bi and
bC are biases.
Next, update the cell state at the current time step according to the results of the forget gate and input gate:
Finally, the output gate determines information in the cell state to be output to the hidden state, and generates the hidden state at the current time step:
where
Wo is the weight matrix, and
bo is the bias.
3.2. Evaluation Indicators
In addition, to comprehensively evaluate the model’s predictive performance, Mean Absolute Error (MAE), Mean Square Error (MSE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE) are selected as performance metrics. The calculation formulas of each indicator are as follows:
4. Empirical Analysis
4.1. Data Collection and Processing
This research collects data on carbon emission trading prices from the historical records of the Hubei Carbon Emission Trading Center from 24 April 2017, to 20 February 2025. Referring to the studies of Wang & He [
4], Duan et al. [
5], and Li & Ren [
6], and considering the economic representativeness of the indicators, this research selects three indicators—carbon emission futures, S&P 500 index, and US dollar index—as influencing factors, and all data are from the Investing.com database. All indicators are shown in
Table 1.
The Pearson correlation matrix of carbon price and selected additional features is presented in
Figure 5. Each selected feature exhibits a statistically significant correlation with carbon prices, indicating that these features may have the predictive capability for carbon price.
To ensure the continuity of each time series and eliminate the impact of missing data on the model, this research uses a linear interpolation method to fill in missing values in each dataset, resulting in 1837 valid data points. By conducting linear interpolation, it is possible to reasonably estimate the intermediate missing values. Unlike forward or backward filling methods that may introduce artificial plateaus, linear interpolation preserves the continuity of price dynamics and reflects the gradual price change between adjacent trading days. The descriptive statistics of the processed carbon price and each indicator are shown in
Table 2.
In
Table 2, the maximum closing price of Hubei carbon emission trading is 61.48 yuan, the minimum is 11.56 yuan, and the average is 34.00 yuan. The overall price level is within a reasonable range compared to China’s carbon emission trading pilots and the national market. At the same time, the difference between the maximum and minimum is about 50 yuan, and the standard deviation is 11.29, reflecting the significant large-scale fluctuation of carbon prices, indicating the non-stationary and high noise characteristics of carbon prices affected by multiple factors, and also reflecting the necessity of accurate prediction for carbon prices. In terms of indicators, the average carbon emission futures price is 47.11 yuan, with a standard deviation of 28.52, which is significantly higher than that of carbon prices, indicating that the futures market is more sensitive to changes in policy, demand, and other factors. The S&P 500 index is larger in scale than carbon prices, reflects stock market fluctuations, and may have a correlation with carbon prices. The standard deviation of the US dollar index is 5.53, with relatively stable changes. Its short-term impact on carbon prices is small, but it may indirectly affect them through certain factors.
Next, this research uses the first 80% of the data in the sequence as the training set and the last 20% as the test set. The data and division of carbon prices and various indicators are shown in
Figure 6 and
Figure 7.
To eliminate the impact of differences in indicator dimensions on model construction and prediction, this research uses min-max normalization to scale data to the [0, 1] interval. The formula is as follows:
4.2. Model Training and Parameter Setting
The research uses a rolling window with a length of 20 days to achieve one-step rolling prediction, that is, using carbon prices and data of other indicators of the previous 20 days as input to predict the carbon price of the next time step, and then rolling forward one time step and repeating the above process.
The model’s parameter settings mainly focus on the CNN-LSTM part as follows: the time window length is set to 20, the number of hidden state units is 128, the dropout ratio is 0.001, the activation function is ReLU, and the Adam optimizer is used. In addition, low-frequency signal components achieve good fitting with relatively fewer iterations, whereas high-frequency components require more iterations. Therefore, the number of iterations is set to 200 for high-frequency signals and 60 for low-frequency signals. For ICEEMDAN, this research set the number of ensemble members as 100, and the initial noise level as 0.4. Detailed parameter settings are shown in
Table 3.
4.3. Experimental Result Analysis
4.3.1. Signal Decomposition Results
The decomposition results from the ICEEMDAN method are shown in
Figure 8.
Figure 8 shows that ICEEMDAN automatically decomposes the original carbon price sequence into several components, effectively separating the noise and trend information. The subsequences are more stationary than the original sequence, and the problem of modal aliasing is avoided. Among them, IMF1~IMF3 exhibit the most violent fluctuations and the shortest cycles, which can reflect short-term random noise and small fluctuations in the original sequence. IMF4~IMF6 exhibit relatively small fluctuations and long cycles, which may reflect the impact of medium-term factors, such as short-term changes in energy prices. IMF7~IMF8 exhibit more gentle fluctuations and longer cycles, which may correspond to long-term trend changes, such as policy modifications. IMF9 shows a stable trend change with almost no high-frequency fluctuations, which can represent the long-term core trend of the original sequence.
4.3.2. Benchmark Model Comparison
To verify the effectiveness of the ICEEMDAN-CNN-LSTM model, the LSTM and CNN-LSTM models are selected as benchmark models to compare their predictive performance. The proposed model is constructed, and actual data are used to complete the training and prediction steps as described above. The final experimental results of each model are shown in
Figure 9.
The experimental results show that during training, the performance of the ICEEMDAN-CNN-LSTM and CNN-LSTM models is generally better than that of the LSTM model. Given characteristics of the methods used in each model, this result can be explained by the LSTM method effectively extracting characteristic information from each influencing factor and the carbon price itself during training, thereby achieving better training results. At the end of training set, roughly from March 2022 to July 2023, the performance of the CNN-LSTM model declines. A possible reason is that the data at the end of the training set contain patterns different from those in earlier periods, and the LSTM method fails to capture them in time, instead continuing to use the original pattern, resulting in deviations from real values. In contrast, the ICEEMDAN-CNN-LSTM model provides the best fit on the training set and effectively captures the rise-and-fall trend of the original sequence.
In the test part, it can be noted that the LSTM and CNN-LSTM models can only roughly depict the trend, and are insufficient to capture detailed rise and fall. In comparison, the ICEEMDAN-CNN-LSTM model is more accurate and better simulates detailed changes in the data. It benefits from the decomposition by ICEEMDAN, which makes the features in component signals easier for the model to learn and fit, enabling more accurate results when the CNN-LSTM method is applied to complete the prediction and recombination in subsequent processes.
The performance comparison of each model under various evaluation indicators is shown in
Table 4.
Table 4 shows that the ICEEMDAN-CNN-LSTM model proposed in this research achieves better performance in predicting carbon prices than other models. On the test set, it achieves an MAE of 1.140 yuan, which is 59.1% lower than LSTM and 65.2% lower than CNN-LSTM. At the same time, the RMSE of the proposed model is reduced by 57.2% and 62.9% compared to LSTM and CNN-LSTM, respectively. The decomposition of data using ICEEMDAN significantly improves prediction accuracy, thereby verifying the effectiveness of the proposed model. Also, by comparing performance between the training and test sets of each model, the proposed model maintains consistently lower errors than benchmark models, demonstrating strong generalization ability without significant overfitting. In terms of MAPE, the proposed model achieves 2.469% on the test set, compared to 6.245% for LSTM and 7.173% for CNN-LSTM, representing reductions of 57.6% and 63.1%, respectively. These improvements confirm that the “decomposition–extraction–fitting” framework effectively handles the non-stationary and high-noise characteristics of carbon price series.
4.3.3. Robustness Analysis Under Different Data Splits
In addition, the model’s robustness is verified by adjusting the splitting ratios of training and test sets. The comparison of different splitting ratios and corresponding evaluation indicators is shown in
Table 5.
Table 5 shows that the proposed model’s performance is significantly better than the benchmark models across different training/test splitting ratios, and the error values are within the allowable range, indicating that the ICEEMDAN-CNN-LSTM model is robust. When the training proportion decreases from 80% to 50%, all models exhibit increasing test set errors, as the LSTM shows the most dramatic deterioration, followed by the CNN-LSTM. The proposed model also shows error growth, yet its absolute error under the most challenging 50%/50% split remains lower than LSTM under the favorable 80%/20% split.
4.3.4. Statistical Significance Test
To further verify the statistical significance of the prediction accuracy of the proposed model, the Diebold–Mariano (DM) test is used to compare the prediction results of test set in pairs, with square error as the loss function, and the prediction step is set to 1. The Diebold–Mariano test result is shown in
Table 6.
Table 6 shows that the DM statistics for ICEEMDAN-CNN-LSTM against LSTM and CNN-LSTM are 11.454 and 15.892, respectively (both
p < 0.001), significantly rejecting the null hypothesis at the 1% level. This confirms that the predictive advantage of the proposed model is statistically significant.
4.3.5. Ablation Experiment for IMF Component Contributions
To further validate the contribution of each IMF component in ICEEMDAN decomposition to carbon price prediction, this research conducts an ablation experiment. Specifically, each of the nine IMF components is predicted independently and the impact of sequentially removing each component on overall prediction accuracy is analyzed. The prediction errors of individual IMF components and the results of the ablation experiment are shown in
Table 7 and
Table 8.
Table 7 shows that the prediction errors of individual IMF components exhibit significant variation. IMF8 achieves the largest prediction error, with an MAE of 1.054 yuan, contributing 37.19% of the total prediction error. Meanwhile, IMF1 ranks second, with an MAE of 0.548 yuan, accounting for 19.34% of the total error. IMF2, IMF3, and IMF9 have MAE values between 0.22 and 0.25 yuan, with a combined contribution of approximately 24.69%. In contrast, IMF4 through IMF7 exhibit relatively small prediction errors, with MAE values ranging from 0.12 to 0.18 yuan, collectively contributing only 18.78% to the total error.
This distribution reflects the differential contributions of frequency components to the carbon price series. The mid-frequency components IMF4–IMF7 demonstrate good predictability with relatively small prediction errors. In contrast, IMF8, a low-frequency component approaching the trend due to its larger amplitude, and IMF1, the highest-frequency component due to its strong randomness, both exhibit larger prediction errors.
Table 8 shows that after removing the trend component IMF9, the MAE greatly increases from the baseline value of 1.247 yuan to 45.080 yuan, representing an increase of 3515.56%, making the prediction almost completely ineffective, while removing IMF8 also increases the MAE by 248.06% to 4.340 yuan. The results demonstrate that the low-frequency trend components, particularly IMF9 and IMF8, are the core drivers of carbon price prediction, and the long-term trend information they contain plays an important role in prediction accuracy.
At the same time, removing IMF1 reduces the MAE by 20.56%, indicating that IMF1 has a negative contribution to the prediction, and showing that the current equal-weighted linear combination may not be optimal. The underlying reason is that high-frequency components are much more susceptible to short-term random noise, which makes them less predictable. Therefore, their prediction errors propagate to the final result when included in equal weight. Future research could explore adaptive weighting strategies or selective combination to address this limitation. Removing IMF2 and IMF4 leads to slight MAE increases, while removing the remaining IMFs also makes reasonable MAE increases.
4.3.6. Comparison with an Alternative Decomposition Method
In addition, the research verifies the improvement of the model made by ICEEMDAN decomposition by replacing ICEEMDAN with VMD, which changes the model into VMD-CNN-LSTM. The performance of this model under various evaluation indicators is shown in
Table 9.
Table 9 shows that the performance of the VMD-based model under the indicators is weaker than that of the ICEEMDAN-based model, suggesting that using ICEEMDAN as the decomposition method contributes to improving the performance of the model.
4.3.7. Comparison with a Traditional Statistical Method
To emphasize the advantages of deep learning approaches over traditional statistical methods, the SARIMAX method is used to complete the same prediction task and is judged under same evaluation indicators, as shown in
Table 10.
Table 10 shows that SARIMAX has limited prediction performance and generalization ability, which proves that deep learning approaches have the advantage over traditional statistical ones.
5. Conclusions
Aiming at the research topic of carbon emission trading price prediction, this research proposed the ICEEMDAN-CNN-LSTM model. After necessary data preprocessing, the original data were decomposed into signal components with different frequencies using the ICEEMDAN method, which retains characteristic information of the original data. Then, the CNN-LSTM method was used to capture the long short-term dependence of each component and the impact of relevant factors on fluctuations in the research subject’s data, and to obtain predictions for each component. Finally, the prediction results of the corresponding original data were obtained by linear addition. Empirical analysis of the closing price of Hubei carbon emission trading was used to compare the performance of multiple prediction methods across a variety of evaluation indicators, and the effectiveness of the proposed model was verified.
At the same time, the robustness of the proposed model was verified by re-dividing the training and test sets. In addition, the proposed model achieves significantly lower prediction errors compared to benchmark models; the improvements are also confirmed by the Diebold–Mariano test. Furthermore, the ablation experiment reveals that low-frequency trend components are the core drivers of carbon price prediction. Also, the advantages over the alternative decomposition method VMD and the traditional statistical method SARIMAX are reflected by performing corresponding experiments that confirm the effectiveness of the proposed model for capturing the non-linear dynamics of carbon prices. Through the “decomposition–extraction–fitting” framework, the proposed model successfully separates noise from trends and was capable of more effectively capturing multi-scale features, thereby predicting carbon prices more accurately, and provided a new research path for the field of carbon price prediction.
The proposed model offers several practical implications for carbon market participants and policymakers. First, from the perspective of trading practice, the prediction results of the proposed model can serve as a reference for enterprises and investors to formulate carbon asset allocation strategies. However, it is crucial to acknowledge that the prediction errors, even when minimized, may lead to missing the timing of trades or misjudgments of price trends. Therefore, market participants should regard the outputs of the model as one of multiple decision-making tools with risk management mechanisms. Second, regarding policy dynamics, carbon prices could greatly change when policy is adjusted. Therefore, real-time policy monitoring should be considered to maintain predictive validity. Finally, concerning computational feasibility, the proposed hybrid framework has higher computational cost requirements compared to simpler or single-model approaches. Quantitatively, based on the actual experimental environment, training the full ICEEMDAN-CNN-LSTM model with nine IMF components requires about 55 min on a laptop equipped with an Intel Core i5-10210U CPU (1.60 GHz, 8 Cores) and 16 GB RAM, with no GPU acceleration. The decomposition and multi-component training processes increase both computing time and memory requirements. For practical deployment, trade-offs between prediction accuracy and computational efficiency should be evaluated, and model compression or parallel computing techniques may be explored.
Based on this research, future research can further explore issues related to the carbon emission trading price: First, extending the analysis to other carbon markets like the EU ETS, China’s national carbon market, and other regional pilots would enhance generalizability. Second, examining the mechanism underlying its formation and quantifying the impact of policy-making, market supply and demand, energy structure, and other factors on carbon prices. Third, focusing on the behavior patterns of all parties involved in carbon emission trading, and analyzing the mechanisms by which micro-entities, such as enterprises and investors, respond to carbon price fluctuations. Finally, research on how to develop reasonable participation strategies based on the prediction results of carbon emission trading prices could also be worthy of consideration.