To accurately capture the complex temporal characteristics and fluctuation patterns of carbon emission allowance prices and to provide a scientific medium- to long-term price reference for subsequent option hedging strategies, this paper introduces the LSTM model to conduct research on medium- to long-term forecasting of carbon emission allowance closing prices.
5.2.1. Model Selection and Construction
In selecting a method for allowance price prediction, the Long Short-Term Memory network model demonstrates significant applicability in forecasting non-stationary, non-linear time series data such as carbon market trading prices. Specifically, although traditional machine learning models—including linear regression, decision trees, and support vector machines—perform excellently on many problems, they often exhibit certain limitations when processing time series data characterized by long-term dependencies, dynamic length, and complex temporal dynamics. These limitations include handling long-term dependencies, fixed-length input requirements, lack of temporal dynamics modeling, constraints in feature extraction, and challenges in multi-step prediction. Recurrent neural network models such as LSTM, through their specialized structural design, can more effectively address these challenges.
Long Short-Term Memory (LSTM) is essentially a specific form of Recurrent Neural Network that addresses the short-term memory problem arising from vanishing gradients in RNNs by incorporating gates. This enables the recurrent neural network to effectively utilize long-range temporal information. It includes three logical control units—the input gate, output gate, and forget gate—each connected to a multiplicative element. By setting the weights at the connections between the network’s memory unit and other components, it controls the input and output of information flow as well as the state of the memory cell. These gating mechanisms help the network determine how to process input data, update internal states, and output prediction results, thereby better capturing long-term dependencies in time series data. Its specific working mechanism is as follows:
Forget Gate: Determines whether to forget the previous memory state. This gating mechanism can, to some extent, mitigate the vanishing gradient problem, thereby better handling long-term dependencies.
Here, is the output of the forget gate, is the hidden state of the previous time step, is the input of the current time step, and and are the weight matrix and bias vector of the forget gate. Sigmoid is an activation function.
Input Gate: Determines which information from the current input data needs to be updated into the memory state. This gating mechanism controls the update of the memory state.
Here, is the output of the input gate, is the candidate information for updating, , and , are the weight matrices and bias vectors of the input gate and candidate information, respectively. is the hyperbolic tangent function.
Memory Cell: Under the control of the input gate, a new memory cell is calculated to replace the old memory cell, adapting to the input data of the current time step.
Output Gate: Determines the output result of the current time step based on the current memory state and input data.
Here, is the output of the output gate, is the hidden state of the current time step, and and are the weight matrix and bias vector of the output gate.
Through the computation and updates of the aforementioned gating mechanisms, the LSTM network can effectively capture long-term dependencies in sequential data and better handle tasks such as time series prediction and natural language processing. The specific structure of the LSTM network is illustrated in
Figure 3.
This paper utilizes the PyTorch 2.10.0 deep learning framework in Python 3.12 to construct the LSTM model. All code development and experiments were conducted in the PyCharm 2024.1.7 IDE. The LSTM model consists of three parts: an input layer, a hidden layer, and an output layer. The model is trained using the error backpropagation algorithm (refer to [
33,
34,
35,
36]). The model assumes an open market hypothesis, with the value type adopting market value. The assessment method involves constructing an LSTM model to evaluate from a time series perspective. As a tradable commodity, carbon allowance prices fluctuate daily, with historical transaction prices forming time series data. Therefore, this paper does not set a specific valuation base date; instead, it evaluates carbon asset prices over a continuous period, measuring the feasibility of the LSTM time series prediction model based on the overall assessment effect during that period.
5.2.3. Model Training and Validation
This paper divides the aforementioned carbon trading prices into a training set and a test set in chronological order at an 8:2 ratio. The training set is used to train the model, while the test set is used for prediction, with the prediction results serving to evaluate the model’s feasibility.
To achieve optimal prediction performance, this paper conducts multiple rounds of training and testing on the constructed model, selecting the parameters with the smallest error as the model parameters. Specifically, the LSTM network consists of three layers in total: one input layer, one output layer, and one hidden layer. The selection of key hyperparameters follows the principle of balancing temporal feature extraction, predictive accuracy, and model stability. The time step was tested over several candidate values, and 6 was adopted because it produced the best overall validation performance while preserving sufficient short-term trading information without introducing excessive noise accumulation. The number of hidden neurons was also tuned through repeated training under alternative settings. A size of 256 provided a comparatively better fit to the nonlinear dynamics of carbon prices than smaller configurations, while avoiding the instability and overfitting tendency observed in larger settings. Therefore, the final parameter combination was determined empirically on the basis of validation loss, prediction accuracy, and generalization performance rather than by arbitrary choice.
To prevent model overfitting, a dropout layer is connected between each LSTM layer. After determining the number of network layers, the time step is subsequently set. By modifying the step size of the time step for training and comparing the training effects, the model time step is ultimately set to 6, meaning that the first 6 historical data points (refer to [
37,
38,
39] for alternative approaches for similar situations) are used to predict the next data point. Accordingly, the number of nodes in the input layer of the neural network is 6, and the number of nodes in the output layer is 1. The number of nodes in each hidden layer is set to 256, and the optimizer is set to Adam. The number of training iterations is adjusted until the model loss stabilizes around 0.05, at which point the number of training iterations is approximately 60. The maximum number of epochs for traversing samples is set to 60, and the initial learning rate is set to 0.01. After determining the model structure and parameters, the training set samples are imported into the model for training, and subsequently, the test set samples are used for prediction. Partial prediction results are as follows (complete results can be found in
Appendix A). The partial comparison results and prediction performance are presented in
Table 4 and
Figure 4, respectively.
In the figure, the horizontal axis represents the number of prediction days, and the vertical axis represents the carbon allowance price (RMB). The blue line indicates the actual carbon price values, while the orange line represents the predicted values from the test set. It can be observed that the predicted values closely fit the actual values.
Figure 5 shows the training/validation loss curve of the LSTM carbon price prediction model, which intuitively reflects the model’s training performance: the training set loss (red) rapidly converges to near zero, while the validation set loss (green) stabilizes around 0.05 without rebounding. This not only demonstrates that the model has fully learned the temporal patterns of carbon prices but also validates that it does not overfit and exhibits good generalization ability for new data. Simultaneously, it substantiates the rationality of the early stopping training strategy and serves as a key preliminary experimental basis for the model’s prediction accuracy (MAPE 4.23%).
Furthermore, this paper further evaluates the model’s learning performance using the following metrics:
(1) Mean Squared Error (MSE): Square the difference between the predicted value and the actual value for each sample, then average over all samples. The formula is:
where
is the actual value,
is the predicted value, and
is the number of samples. The closer the MSE is to 0, the more reliable the model.
(2) Root Mean Squared Error (RMSE):
The closer the RMSE (refer to [
40,
41,
42,
43,
44]) is to 0, the more reliable the model.
(3) Mean Absolute Error (MAE): Take the absolute difference between the predicted value and the actual value for each sample, then average over all samples. The formula is:
The closer the MAE (refer to [
45,
46,
47,
48]) is to 0, the more reliable the model.
(4) Mean Absolute Percentage Error (MAPE): Divide the difference between the predicted value and the actual value by the actual value for each sample, take the absolute value, and then average. The formula is:
The closer the MAPE is to 0, the more reliable the model.
(5) R-squared (
R2): Calculate the total sum of squares, regression sum of squares, and residual sum of squares using the actual and predicted values, then compute
R2. The formula is:
where
is the residual sum of squares and
is the total sum of squares. The closer the
R2 is to 1, the more reliable the model.
The evaluation metrics for the prediction performance of the model are presented in
Table 5.
Although the model achieves acceptable prediction accuracy, the test result of R2 = 0.8028 indicates that the explanatory power remains moderate rather than exceptionally high. This suggests that the model captures the main temporal pattern of carbon prices, but still has limited ability to reflect abrupt shocks and some unobserved external influences. In addition, the present validation is based on a chronological train-test split within the same sample period rather than a fully independent out-of-sample test. Therefore, the reported results support the practical usefulness of the model, but they should be interpreted with appropriate caution.
From the calculation results of the above metrics, it can be seen that the model evaluation accuracy is relatively high. Consequently, the reliability and applicability of this model in the valuation of carbon allowance assets have been verified.
5.2.4. Medium- and Long-Term Price Simulation of Carbon Allowances Based on the LSTM Model
To accurately capture the complex temporal characteristics and fluctuation patterns of carbon allowances prices, and to provide scientific medium- and long-term price references for market participants, this paper takes the daily trading data of the national carbon trading market from 2021 to 2026 as the research object (complete data can be found in
Appendix A). A deep learning model with dual LSTM layers plus Dropout regularization (we refer to [
49,
50] for similar approaches) is constructed to conduct medium- and long-term prediction research on the closing prices of carbon allowances. Through steps including data preprocessing, model training, hyperparameter optimization, and rolling prediction, price simulations for the next 7 days, 30 days, and 90 days are achieved. Sample data of the national carbon trading prices are presented in
Table 6. Model performance is validated using metrics such as MAE, RMSE, and
R2.
Addressing the temporal dependency and nonlinear characteristics of carbon allowances prices, a dual-layer LSTM deep learning model is constructed. The specific structure is as follows: the input layer receives a 4-dimensional feature vector (opening price, closing price, trading volume, and daily turnover), with an input dimension of [batch_size, time_step, feature_num]. The first LSTM layer contains 64 neurons, with batch_first = True set to accommodate the time-series data format; output sequence features are followed by a 0.2 dropout layer to suppress overfitting. The second LSTM layer reduces the input feature dimension to 32, further extracting higher-order temporal features, and is similarly equipped with a dropout layer to enhance model generalization. The fully connected layer achieves price regression prediction through a 32 → 1 mapping, employing the ReLU activation function to strengthen nonlinear fitting capability.
Integrating the characteristics of daily time-series data with model training requirements, the optimal hyperparameter combination is determined through multiple adjustments: time step = 10, batch size = 16, number of epochs = 80, learning rate = 0.001, and early stopping patience = 4 (training stops if validation loss does not decrease for 4 consecutive rounds). The training set and test set are strictly divided in chronological order at an 8:2 ratio to prevent data leakage.
The model is trained using the Adam optimizer and the Mean Squared Error loss function, with real-time monitoring of changes in training loss and validation loss during the training process. In the early stages of training, the loss value decreases rapidly, indicating that the model is effectively learning data features. As the number of iterations increases, the training loss and validation loss gradually converge. When the validation loss shows no significant decrease for 4 consecutive rounds, the early stopping mechanism is triggered, and the optimal model parameters are saved to avoid overfitting. The training and validation loss curves are shown in
Figure 6.
The figure shows that during the training process, the loss curve smoothly declines and tends to converge, with a small gap between the training loss and validation loss, indicating that the model has good generalization ability and no significant overfitting phenomenon.
Figure 7 shows that within the test set, the closing prices predicted by the model exhibit a high degree of overlap with the actual closing price curves. Whether during periods of price decline, fluctuation, or rebound, the predicted values consistently follow the trends and fluctuation amplitudes of the actual values, with relatively small deviations. This validates the model’s precise capturing capability of the temporal characteristics of carbon prices.
In
Figure 8, the closing prices predicted by the LSTM model and the actual closing price data points are highly concentrated around the ideal prediction line (y = x), exhibiting a significant linear positive correlation distribution. This indicates that the deviations between the model’s predicted values and the actual values are relatively small, further validating that the model’s prediction results for carbon allowances prices possess high accuracy and consistency.
The evaluation metrics for the model’s prediction performance are presented in
Table 7.
The training set achieves an MAE of 1.5017 RMB/ton and an
R2 of 0.9873, while the test set achieves an MAE of 1.9006 RMB/ton, an RMSE of 2.4979 RMB/ton, and an
R2 of 0.9434. The relative error proportion is only 2.38–3.33%, and the model explains 94.34% of carbon price volatility. The performance difference between training and test sets is reasonable, with no significant overfitting. Its advantages stem from the dual LSTM layers’ ability to extract higher-order temporal features, the regularization effect of Dropout, and the price-volume linkage information provided by daily turnover, thereby offering reliable decision support for emission-controlled enterprises, investors, and policymakers. Recent methodological advances have introduced transformer-based architectures for carbon price forecasting, with self-decomposition mechanisms enabling adaptive preprocessing without reliance on external decomposition methods [
51]. However, the model still has room for improvement in responding to external shocks and sudden price changes. Future enhancements could involve introducing multi-source external features, optimizing outlier handling (see [
52,
53,
54,
55] for alternative treatments), and integrating attention mechanisms.
To interpret the forecasting results more rigorously, the performance of the dual-layer LSTM should be understood relative to simpler benchmark structures rather than in isolation. In this study, the comparison with single-layer and reduced-feature models in the ablation analysis shows that deeper temporal extraction, regularization, and turnover information all contribute materially to predictive accuracy. This also supports the appropriateness of the selected features, time-step setting, and hyperparameter configuration for the present forecasting task, since these choices improve model fit and generalization simultaneously rather than only increasing technical complexity.
To verify the contribution of each core component to prediction performance and clarify the mechanism of key technical modules, an Ablation experiment is designed. By progressively removing the Dropout layer, the second LSTM layer, and the daily turnover feature, four comparison models are constructed, trained, and evaluated under identical data and hyperparameter configurations to quantify each component’s contribution to prediction accuracy.
The Baseline model is the complete model constructed in this paper: dual LSTM layers (64 → 32 neurons) + Dropout (0.2) + 4-dimensional input features (opening price, closing price, trading volume, daily turnover). The ablation models are configured as follows: M1 removes the Dropout layer, retaining dual LSTM layers and 4-dimensional features; M2 removes the second LSTM layer, retaining a single LSTM layer, Dropout (0.2), and 4-dimensional features; M3 removes the daily turnover feature, retaining dual LSTM layers, Dropout (0.2), and 3-dimensional features (opening price, closing price, trading volume); M4 simultaneously removes the Dropout layer, the second LSTM layer, and the daily turnover feature, retaining a single LSTM layer and 3-dimensional features. Throughout the experiment, data preprocessing procedures, hyperparameter configurations, and evaluation standards are kept consistent to ensure comparability of experimental results.
Figure 9 illustrates that in the Ablation experiment, the baseline model exhibits the lowest training and validation losses with minimal fluctuation after convergence, while all ablation models (M1–M4) show higher losses overall. M1 exhibits significantly higher validation loss compared to training loss, indicating pronounced overfitting. M2 and M3 demonstrate slower loss convergence than the baseline model. M4 maintains the highest loss throughout. These findings confirm that the baseline model’s architecture—dual LSTM layers plus Dropout and complete feature set—is superior in convergence efficiency, fitting performance, and generalization stability, validating the critical support role of core components in model training performance.
The Ablation experiment results are presented in the following table, with all model performance evaluated based on test set metrics. The results show that ablation of individual components leads to varying degrees of performance degradation: compared to the baseline model, M1 shows a 35.3% increase in MAE, a 30.9% increase in RMSE, and a 1.8% decrease in
R2, validating the regularization effect of the Dropout layer. M2 exhibits a 24.8% increase in MAE and a 20.6% increase in RMSE, indicating that the deep structure of dual LSTM layers is better suited to the multi-level temporal characteristics of carbon allowances prices. M3 experiences the most significant performance decline, with a 41.8% increase in MAE and a 2.2% decrease in
R2, demonstrating that the price-volume linkage information provided by daily turnover is a core input for enhancing prediction accuracy. M4 shows substantial performance deterioration, with an 84.9% increase in MAE and an
R2 drop to 0.9347, reflecting the synergistic effect of the combined components. The ablation experiment results are presented in
Table 8.
Figure 10 clearly shows that the baseline model achieves the lowest MAE and RMSE and the highest
R2. In contrast, after removing any component, all ablation models exhibit increased error metrics and decreased
R2, with M4 showing the poorest performance due to multi-component ablation. This intuitively verifies the synergistic advantage of the dual LSTM layers + Dropout + daily turnover feature architecture.
In
Figure 11, with “prediction error (RMB/ton)” as the vertical axis and “ablation model” as the horizontal axis, the distribution characteristics of prediction errors for different models are intuitively presented through boxes (representing error concentration intervals), whiskers (representing error extreme ranges), and outlier points. The baseline model exhibits the narrowest box and smallest whisker range, indicating the highest error concentration and most stable fluctuation. In contrast, all ablation models show broader boxes and wider whisker ranges than the baseline model: M1 exhibits a significantly expanded box width, M3 shows further dispersed error distribution, and M4 demonstrates the widest box with multiple extreme outliers.
Based on the trained LSTM model validated by the Ablation experiment, a rolling prediction method is employed to simulate carbon allowances prices for the next 7 days, 30 days, and 90 days. This method uses the previous day’s predicted value as the input feature for the next day, progressively generating medium- and long-term price sequences through iteration, effectively capturing the temporal dependencies (refer to [
56,
57,
58,
59] for similar discussions) of prices and enhancing the rationality and coherence of long-term predictions.
The simulation results for medium- and long-term carbon allowances prices are as follows: In the short term (next 7 days), prices show a slight fluctuating upward trend, with a mean of 71.42 RMB/ton, a fluctuation range of 70.36–72.10 RMB/ton, and a standard deviation of only 0.63 RMB/ton. In the medium term (next 30 days), prices rise steadily with a mean of 72.24 RMB/ton—an increase of 0.81 RMB/ton from the short term—and a widened fluctuation range of 70.36–72.61 RMB/ton. In the long term (next 90 days), prices continue a moderate upward trend, with a mean of 72.50 RMB/ton, a maximum of 72.63 RMB/ton, and a minimum of 70.36 RMB/ton, while the standard deviation narrows to 0.37 RMB/ton, indicating gradually reduced volatility and enhanced stability in long-term prices.
Figure 12 illustrates that the medium- to long-term prediction results of carbon trading closing prices exhibit a pattern of “stable short-term volatility, expanded medium-term volatility, and narrowed long-term volatility.” The price fluctuation range is relatively concentrated for the next 7 days. For the next 30 days, the price center shifts upward and the fluctuation range expands. For the next 90 days, price volatility significantly narrows and tends to stabilize. This indicates that the medium- to long-term prices predicted by the LSTM model show a moderate upward trend overall, with enhanced stability in long-term prices, providing a reference basis with controllable fluctuations for the medium- to long-term management of carbon assets. The statistical indicators of the medium- and long-term predictions are presented in
Table 9.
The simulation results indicate that the medium- to long-term prices of carbon allowances in the national carbon trading market exhibit a pattern of “slight short-term increase, steady medium-term upward movement, and long-term rise with narrowing volatility.” The average prices for the next 7 days, 30 days, and 90 days are 71.42 RMB/ton, 72.24 RMB/ton, and 72.50 RMB/ton, respectively, with the fluctuation range gradually narrowing to 70.36–72.63 RMB/ton. Overall, the market operates steadily with a slowly rising price center.
As core entities subject to emission controls, thermal power enterprises face medium- to long-term risk exposure in carbon assets primarily manifested in fluctuations in allowance procurement costs and uncertainty in compliance pressure. The accurate price simulations provided by the LSTM model offer critical support in addressing these challenges: by capturing the temporal dependencies and fluctuation patterns of carbon prices, the model provides enterprises with scientific price references, facilitating advance planning of procurement schedules, locking in medium- to long-term compliance costs, and effectively avoiding the risk of cost overruns caused by price increases. Furthermore, stable and predictable price simulation results provide a quantitative basis for thermal power enterprises to optimize carbon asset management strategies, balance emission reduction investments with carbon asset returns, and significantly reduce operational uncertainty arising from medium- to long-term risk exposure.