A Hybrid Model for Carbon Price Forecasting Based on Improved Feature Extraction and Non-Linear Integration

Zhu, Yingjie; Chen, Yongfa; Hua, Qiuling; Wang, Jie; Guo, Yinghui; Li, Zhijuan; Ma, Jiageng; Wei, Qi

doi:10.3390/math12101428

Open AccessArticle

A Hybrid Model for Carbon Price Forecasting Based on Improved Feature Extraction and Non-Linear Integration

¹

School of Mathematics and Statistics, Changchun University, Changchun 130022, China

²

Economics School, Jilin University, Changchun 130012, China

³

Graduate School, Changchun University, Changchun 130022, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2024, 12(10), 1428; https://doi.org/10.3390/math12101428

Submission received: 16 April 2024 / Revised: 30 April 2024 / Accepted: 3 May 2024 / Published: 7 May 2024

Download

Browse Figures

Versions Notes

Abstract

:

Accurately predicting the price of carbon is an effective way of ensuring the stability of the carbon trading market and reducing carbon emissions. Aiming at the non-smooth and non-linear characteristics of carbon price, this paper proposes a novel hybrid prediction model based on improved feature extraction and non-linear integration, which is built on complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), fuzzy entropy (FuzzyEn), improved random forest using particle swarm optimisation (PSORF), extreme learning machine (ELM), long short-term memory (LSTM), non-linear integration based on multiple linear regression (MLR) and random forest (MLRRF), and error correction with the autoregressive integrated moving average model (ARIMA), named CEEMDAN-FuzzyEn-PSORF-ELM-LSTM-MLRRF-ARIMA. Firstly, CEEMDAN is combined with FuzzyEn in the feature selection process to improve extraction efficiency and reliability. Secondly, at the critical prediction stage, PSORF, ELM, and LSTM are selected to predict high, medium, and low complexity sequences, respectively. Thirdly, the reconstructed sequences are assembled by applying MLRRF, which can effectively improve the prediction accuracy and generalisation ability. Finally, error correction is conducted using ARIMA to obtain the final forecasting results, and the Diebold–Mariano test (DM test) is introduced for a comprehensive evaluation of the models. With respect to carbon prices in the pilot regions of Shenzhen and Hubei, the results indicate that the proposed model has higher prediction accuracy and robustness. The main contributions of this paper are the improved feature extraction and the innovative combination of multiple linear regression and random forests into a non-linear integrated framework for carbon price forecasting. However, further optimisation is still a work in progress.

Keywords:

carbon price prediction; hybrid model; feature extraction; non-linear integration; error correction

MSC:

91B84

1. Introduction

The concentration of carbon dioxide in the atmosphere has risen rapidly as a result of industrialisation and the increase in all types of waste incineration. Emissions of carbon dioxide and other greenhouse gases constitute the main cause of the greenhouse effect [1]. The increasing greenhouse effect has resulted in global warming, with serious negative impacts on the balance of ecosystems. In response to the challenge of climate change, countries have introduced carbon emissions trading markets.

The carbon market, as a key instrument used by governments to address energy transition and low-carbon development, has performed better over the past 20 years [2]. In 2005, Europe established the EU Emissions Trading System (EU ETS), the first greenhouse gas emissions trading system in the world. To meet its carbon peaking and carbon neutrality targets, China has selected eight regions, including Shenzhen, Hubei, Beijing, Guangdong, and Tianjin, as pilot regions for the establishment of a carbon emissions trading market. Furthermore, in 2017, the National Development and Reform Commission (NDRC) formally announced that China would launch a pilot carbon trading market and gave the project a prominent place in the 13th Five-Year Plan, demonstrating the firm confidence of China in the development of a green economy [3]. Through the marketisation of carbon allowances, governments are incentivising companies to switch to cleaner energy or less fossil fuel-intensive production to reduce carbon emissions [4]. The carbon price, as a core indicator of the carbon market, is one of the most effective ways to encourage reductions in carbon emissions and limit the increase in the global average temperature [5]. However, as an emerging market-based instrument, the carbon price is determined by a combination of internal market mechanisms and external influences. The volatility of the carbon price challenges the stability of the market and further affects the efficiency of emission reductions [6]. The core issue of the carbon market is the formation and prediction of the carbon price. Accurately predicting the carbon price will help establish a carbon pricing mechanism, which will facilitate the pricing of other carbon financial products, such as carbon futures and carbon options, and will also be beneficial in providing practical guidance for production, operation, and investment decisions, ultimately achieving green, low-carbon, and high-quality development [7,8].

Due to the complexity of influencing factors, the carbon price tends to be characterised by non-linearity, non-stationarity, and high noise, posing major challenges to carbon price forecasting. Carbon price forecasting is inherently a time-series modelling task [9]. Existing prediction models can be divided into three main categories: statistical models, artificial intelligence (AI) models, and hybrid models. Statistical models mainly include the autoregressive integrated moving average model (ARIMA) [10,11] and the autoregressive conditional heteroskedasticity model (GARCH) [12,13,14]. Statistical models are based on certain economic theories and apply a combination of mathematical and statistical strategies to build models that capture the information embedded in the data. However, statistical models require complex feature engineering and are limited in dealing with non-linear, non-smooth, and non-Gaussian time series. In addition, they do not adequately capture the complex dynamic features in the data [15]. Therefore, more flexible and accurate forecasting methods need to be introduced. Machine learning models predominantly consist of extreme learning machine (ELM) [16,17,18,19], random forest (RF) [3,20,21], and support vector machine (SVM) [22,23,24]. Machine learning models have the advantage of being interpretable and transparent, but their ability to deal with non-linear and non-stationary time series is still inadequate [13]. With the development of artificial intelligence technology and big data, the technical background for predicting carbon prices with deep learning models is maturing. Deep learning models primarily include artificial neural networks (ANNs) [25,26], convolutional neural networks (CNNs) [20,27,28], long short-term memory (LSTM) networks [28,29,30,31], and gated recurrent unit (GRU) networks [9,28,32]. The above research explores the application of AI models in carbon price series forecasting, expanding the field of AI modelling research and achieving significant advancements.

However, given the high degree of uncertainty and non-linearity of carbon price series, a single model is no longer sufficient for accurate forecasting. In response, hybrid models have been studied by scholars to further explore the deeper relationships underlying irregular carbon price volatility. More specifically, hybrid models are typically a combination of signal decomposition strategies and the prediction algorithms described above. One of the most effective ways to reduce the complexity of carbon price series is to implement the decomposition-integration method. The first step is to decompose the original non-stationary time series into a number of relatively regular sub-models. Then, prediction models, including statistical and AI models, are applied to predict the single sub-models of the decomposition so that feature information at different scales can be extracted individually. Finally, the prediction results of each sub-model are reconstructed to obtain the prediction results [33]. Currently, the major decomposition methods include empirical modal decomposition (EMD) and its variants [17,29,31,34,35,36,37], wavelet transform (WT) [38,39,40], and variational modal decomposition (VMD) and its variants [33,41,42,43]. Although the above-mentioned decomposition methods have produced better prediction results, they also have limitations. For example, EMD suffers from modal aliasing and endpoint distortions; WT faces difficulties in choosing wavelet basis functions, high computational complexity of the discrete wavelet transform, and boundary effects; and VMD encounters problems such as difficulty in parameter selection and sensitivity to noise. Despite the limitations of the decomposition methods, all the hybrid models constructed based on the decomposition-integration method outperform single statistical or AI models. Therefore, an in-depth study of the application of decomposition methods in the field of carbon price forecasting is needed to better address the challenges of carbon price forecasting. Complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), as an efficient signal processing technique, reduces the reconstruction error by adding adaptive white noise to the original signal, where each mode is noise-enhanced on a randomly generated white noise background. The advantage of CEEMDAN is that it can effectively prevent modal aliasing and reduce the interference of white noise, thus improving the accuracy and stability of the signal decomposition. For this reason, CEEMDAN will be adopted as the sequence decomposition method in this paper.

It has been shown that AI models with feature extraction not only provide effective pre-processing of data but also have high computational efficiency, enabling the construction of suitable prediction models for time series. However, there are still a number of significant challenges. Firstly, the decomposition of carbon price series involves placing each sub-sequence into a forecasting model without taking into account the different complexities and correlations between the sub-sequences, which reduces forecasting efficiency and accuracy. Secondly, it is important to build models with more appropriate parameters separately, as the prediction model is the same for each sub-sequence, without considering that the sub-sequences differ due to their unique characteristics and frequencies. Thirdly, existing integration methods not only fail to focus on the intrinsic relationship between the original sequence and the reconstructed sequences but also are mainly limited to linear patterns, e.g., combining the predicted values of all reconstructed sequences to obtain the final prediction results. However, linear integration methods can affect the accuracy of predictions, as linear patterns are usually not applicable in all cases. Finally, error correction, which can significantly improve the accuracy of the model, is rarely considered in carbon price forecasting models.

To address the above barriers, a carbon price prediction model based on improved feature extraction and a non-linear integration method, named CEEMDAN-FuzzyEn-PSORF-ELM-LSTM-MLRRF-ARIMA, is proposed. Methodologically, it improves feature extraction and deep learning algorithms, develops an innovative form of non-linear integration based on MLRRF, and improves the accuracy of forecasting the non-smooth and non-linear carbon price. The first step in the method is the decomposition of the original carbon price series into a number of simple, smooth modes using CEEMDAN. Subsequently, simple modes with similar complexity are reorganised according to FuzzyEn, and feature extraction is carried out by combining CEEMDAN with FuzzyEn. This boosts computational efficiency, improves prediction accuracy, and reduces the complexity of the sequence. Then, PSORF, ELM, and LSTM are applied as prediction models for components of varying complexities to better capture the fluctuating characteristics, considering that different modes have unique frequencies and characteristics. Immediately following this, the initial integration of high-, medium-, and low-complexity sub-sequences is performed using MLR to further explore the relationship between the original sequence and each modality. Meanwhile, the aggregation of sub-sequences by MLR with non-linear integration learning further clarifies the relationship and improves the accuracy of aggregation, as non-linear integration learning can better adapt to non-linear data. RF is a typical non-linear bootstrap aggregating (bagging) integration learning method. It makes predictions by constructing combinations of multiple decision trees, each of which has a strong ability to generalise over the training data and serves to mitigate over-fitting during integration. Therefore, in this paper, the non-linear integration method based on multiple linear regression and random forest (MLRRF) is adopted to combine carbon price forecasting results. Finally, ARIMA is applied to correct the error, further boosting the accuracy of the forecast.

The innovations and contributions of this paper are as follows:

(1): A novel prediction method that combines improved feature extraction, hybrid modelling, non-linear integrated learning, and error correction is introduced to provide highly accurate carbon price forecasts. The results demonstrate that the prediction method proposed in this paper remarkably improves carbon price prediction accuracy and has greater anti-interference ability and general applicability.
(2): By considering the different complexities and correlations among the decomposition modes, an improved feature extraction method that combines CEEMDAN and FuzzyEn is implemented to efficiently screen out different features from the original carbon price sequence, which increases extraction efficiency and precision.
(3): As different complexity components have their own characteristics and frequencies, PSORF, ELM, and LSTM are applied as prediction models for high-, medium-, and low-complexity sequences, respectively, better capturing the characteristics of each component.
(4): Because non-linear integration has a smaller error and a wider range of applications than linear integration, RF is introduced as a non-linear learning method for non-linear integration based on MLR, and, therefore, the MLRRF non-linear integration framework is innovatively established.
(5): Error correction is performed on the results of MLRRF integration to further explore the application of error correction in carbon price forecasting.

The remaining sections of this paper are structured as follows. Section 2 outlines the theoretical basis of relevant methods. Section 3 presents the decomposition and integration hybrid forecasting model. Section 4 applies the proposed model to Shenzhen and Hubei and discusses the calculation results. Section 5 presents the conclusions and discussion.

2. Models and Methods

2.1. Completely Adaptive Ensemble Empirical Modal Decomposition of Noise (CEEMDAN)

EMD is an adaptive signal decomposition method proposed by Huang et al. [44] that does not require any assumptions about the data and can decompose complex non-linear and non-smooth signals into a set of intrinsic modal functions (IMFs) and a residual. However, EMD suffers from modal aliasing and an excessive mode count. In response to the problems with EMD, Colorminas et al. [45] introduced complementary EMD. It decomposes the signal into forward and backward IMFs using two complementary EMD methods. Then, it determines the reliability of each IMF using adaptive noise estimation to select the most reliable IMF as a component of the signal. Finally, by inverse-transforming the selected IMFs, the original signal is reconstructed. The detailed process is given below.

Step 1: Standard normally distributed white noise

w_{i} (n)

with different amplitudes is added to the given target signal

x (n)

to produce M different new series. The ith experimental signal sequence is constructed as follows:

x_{i}^{m} (n) = x (n) + γ_{0} w_{i}^{m} (n), (i = 1, 2, . . ., N)

(1)

where

γ_{0}

is the standard deviation of the noise.

Step 2: The first IMF

C_{1} (n)

of the CEEMDAN decomposition is obtained by averaging the M modal components obtained from EMD.

C_{1} (n) = \frac{1}{M} \sum_{i = 1}^{M} I M F_{1}^{i} (n),

(2)

where IMF is a function that satisfies the following two conditions: (1) the number of extrema equals the number of zero crossings with a tolerance of one and (2) the average of the envelope defined by the local maximum and the envelope defined by the local minimum is zero.

The first residual

r_{1} (n)

can be expressed as

r_{1} (n) = x (n) - C_{1} (n) .

(3)

Step 3: Decompose the sequence

r_{1} (n) + γ_{1} E_{1} (w^{i} (n))

. The second IMF can be expressed as

C_{2} (n) = \frac{1}{M} \sum_{i = 1}^{M} E_{2} (r_{1} (n) + γ_{1} E_{1} (w^{i} (n)),

(4)

where k denotes the number of IMFs and

E_{k} (n)

is defined as the kth IMF obtained from EMD.

The second residual

r_{2} (n)

can be represented as

r_{2} (n) = r_{1} (n) - C_{2} (n) .

(5)

Step 4: Similarly, the kth residual

r_{k} (n)

can be written as

r_{k} (n) = r_{k - 1} (n) - C_{k} (n) .

(6)

Step 5: Repeat step 4 until the remaining components cannot satisfy the EMD decomposition condition. Finally, all K IMFs of CEEMDAN are obtained, and the residuals are

R (n) = x (n) - \sum_{k = 1}^{K} C_{k} (n) .

(7)

The sequence of targets is broken down into

x (n) = \sum_{k = 1}^{K} C_{k} (n) + R (n) .

(8)

2.2. Fuzzy Entropy (FuzzyEn)

FuzzyEn is a measure of entropy by taking into account the uncertainty and ambiguity in a sequence, evaluating the complexity and irregularity of a time series. Compared to traditional entropy measures, FuzzyEn is better at capturing the non-linear, irregular, and chaotic features in time series. The details of FuzzyEn are given below.

For a normalized time series

x_{n} = {x_{1}, x_{2}, \dots, x_{n}}

with N sample points, the following sequence of vectors can be formed:

X_{i}^{m} = {x (i), x (i + 1), \dots, x (i + m - 1)} - x_{0} (i), (i = 1, \dots, N - m + 1)

(9)

x_{0} (i) = \frac{1}{m} \sum_{j = 0}^{m - 1} x (i + j),

(10)

where

X_{i}^{m}

represents a sequence of m consecutive x, beginning at point t, and m is the embedding dimension.

The definition of the distance

d_{i j}^{m}

between vectors

X_{i}^{m}

and

X_{j}^{m}

is as follows:

\begin{matrix} d_{i j}^{m} & = d [X_{i}^{m}, X_{j}^{m}] \\ = max_{k \in (0, m - 1)} | X_{i}^{m} (k) - X_{j}^{m} (k) |, (i, j = 1, \dots, N - m + 1, j \neq i) \end{matrix}

(11)

where

X_{i}^{m} (k)

and

X_{j}^{m} (k)

are the k elements of

X_{i}^{m}

and

X_{j}^{m}

, respectively.

Alternatively,

d_{i j}^{m}

can be estimated from the following equation:

\begin{matrix} d_{i j}^{m} & = d [X_{i}^{m}, X_{j}^{m}] \\ = max_{k \in (0, m - 1)} | [x (i + k) - x_{0} (i)] - [x (j + k) - x_{0} (j)] | . \end{matrix}

(12)

The fuzzy similarity

S_{i j}^{m}

between

X_{i}^{m}

and

X_{j}^{m}

is determined by a fuzzy affiliation function

D_{i j}^{m} = exp (- d_{i j}^{2} / r),

(13)

where r represents the tolerance and indicates the width of the curve.

To obtain the average value of fuzzy similarity

D_{r}^{m} (i, r)

, average all fuzzy similarities between vector

X_{i}^{m}

and its neighbouring vectors

X_{j}^{m}

as follows:

D_{r}^{m} (i, r) = \frac{1}{N - m} \sum_{j = 1, j \neq i}^{N - m + 1} D_{i j}^{m} .

(14)

The equation for the fuzzy probability

\emptyset^{m} (n, r)

of two vector sequences matching for all m-dimensional points within tolerance r is as follows:

\emptyset^{m} (n, r) = \frac{1}{N - m + 1} \sum_{i = 1}^{N - m + 1} D^{m} (i, r) .

(15)

By adding dimension

m + 1

,

\emptyset^{m + 1} (n, r)

is obtained:

\emptyset^{m + 1} (n, r) = \frac{1}{(N - m) (N - m - 1)} \sum_{i = 1}^{N - m} \sum_{j = 1, j \neq i}^{N - m} D_{i j}^{m + 1} .

(16)

The probability that two vectors matching for points will continue to match another point is estimated as

\emptyset^{m + 1} (n, r) / \emptyset^{m} (n, r)

.

F u z z y E n (m, n, r)

is defined as the negative natural logarithm of the conditional fuzzy probability:

F u z z y E n (m, n, r) = lim_{N \to \infty} (- ln \emptyset^{m + 1} (n, r) / \emptyset^{m} (n, r)) .

(17)

The statistic

F u z z y E n (m, r, N)

measures the fuzzy entropy of a time series

{x (i) : 1 \leq i \leq N}

with finite length:

F u z z y E n (m, r, N) = - ln \emptyset^{m + 1} (n, r) (n, r) / \emptyset^{m} (n, r)) .

(18)

2.3. Improved Random Forest Using Particle Swarm Optimisation (PSORF)

2.3.1. Partial Swarm Optimisation (PSO)

PSO is a bionic intelligent computing method that simulates flock foraging behaviour. It takes advantage of the sharing mechanism of individual information in a flock of birds so that the movement of the whole flock generates an evolutionary process from disorder to order in the problem-solution space, thus obtaining the optimal solution. PSO has several advantages, including its simple concept, easy implementation, and a reduced number of parameters to adjust. The details of PSO are given below.

Step 1: Initialise the particle swarm. Determine the velocity interval

(V_{min, d}, V_{max, d})

, search space

(X_{min, d}, X_{max, d})

, initialised velocity

V_{i} = (ν_{i 1}, ν_{i 2}, \dots, ν_{i d})

, and position

X_{i} = (x_{i 1}, x_{i 2}, \dots, x_{i d})

of the random particles in the D-dimensional space, where

i \in [1, n]

and n represents the number of particles in the swarm.

Step 2: Determine the individual extreme value. Calculate the fitness value

F_{i}

of each particle, then compare the fitness value

F_{i}

of each particle with the individual extreme value

P_{i}

. If

F_{i} > P_{i}

, then replace

P_{i}

with

F_{i}

.

Step 3: Determine the global extremes. Compare each particle adaptation value

F_{i}

with the global extreme value G. If

F_{i} \geq G

, replace G with

F_{i}

.

Step 4: Update the particle velocity and position. Based on steps 2 and 3, the particle velocity and position are updated according to the following equations:

ν_{i d}^{k + 1} = w_{i} ν_{i d}^{k} + a_{1} r_{1} (P_{i d} - x_{i d}^{k}) + a_{2} r_{2} (G_{d} - x_{i d}^{k}),

(19)

x_{i d}^{k + 1} = x_{i d}^{k} + ν_{i d}^{k + 1},

(20)

where

w_{i} = {[w_{1}, w_{2}, \dots, w_{n}]}^{T}

is the inertia weight, which is non-negative,

a_{1}

and

a_{2}

are acceleration constants to regulate the maximum learning step, and

r_{1}

and

r_{2}

denote random numbers with values in the range (0, 1) to increase the randomness of the search.

Step 5: Determine if the algorithm is terminated. If the end conditions are met, end the algorithm and output the result; if not, return to step 2.

2.3.2. Random Forest (RF)

RF is an integrated learning algorithm first proposed by Breiman [46] in 2001. It mainly extracts multiple samples from the original data using the bootstrap resampling method and constructs a classification tree for each of the samples. Finally, the predictions of the classification trees are used to select the final result by group voting. The specific modelling steps of random forest regression are described below.

Suppose the input data are represented by F (n samples, m features, 1 label), and a random forest containing h trees is constructed.

(1) Construct the sample set. Perform h rounds of random sampling with replacement on the original sample set using the bootstrap method to obtain h subsample sets. For h sub-sample sets, each with randomly selected features, h subsample sets are constructed that may contain different features.

(2) Training. Train h regression trees using the h subsample sets. The node partition of the regression tree adopts the minimum mean square deviation. For each partition feature T corresponding to the partition point s on either side of the partition into the left dataset

D_{l}

and the right dataset

D_{r}

, the expression is

min_{T, s} [\sum_{x_{i} \in D_{l} (T, s)} {(γ_{i} - c_{1})}^{2} + \sum_{x_{i} \in D_{r} (T, s)} {(γ_{i} - c_{2})}^{2}],

(21)

where

γ_{i}

is the label corresponding to

x_{i}

,

c_{1}

is the average of all labels in

D_{l}

, and

c_{2}

is the average of all labels in

D_{r}

.

(3) Prediction. A random forest is constructed with h regression trees, and the predicted value of

\hat{y}

for

x_{i}

by the random forest is

\hat{y} = \frac{1}{h} \sum_{i = 1}^{h} k_{i} (x),

(22)

where

k_{i} (x)

represents the ith regression tree prediction.

The prediction accuracy and speed of RF are strongly influenced by parameters such as the number of decision trees m and the maximum depth of the decision trees d. If m is too low, the model tends to be underfitted, whereas if it is too high, it can neither significantly enhance the model nor increase the computational time of the model. Similarly, with increasing d, the computational complexity of the model rises as the level of fit improves. Therefore, it is vital to set the appropriate m and d. In this paper, we utilise PSO to optimise the parameters in RF and establish a PSORF prediction model. The flowchart of the model is shown in Figure 1.

2.4. Extreme Learning Machine (ELM)

ELM is a fast and efficient machine learning algorithm that maps input data into a high-dimensional feature space by randomly generating hidden layer weights and biases and then computing the output layer weights using an analytical solution. The network structure of ELM is shown in Figure 2, and the details of ELM are given below.

For N arbitrary distinct samples

(x_{j}, t_{j})

, where

x_{j} = {[x_{j 1}, x_{j 2}, \dots, x_{j n}]}^{T} \in R^{n}

and

t_{j} = {[t_{j 1}, t_{j 2}, \dots, t_{j m}]}^{T} \in R^{m}

, a single hidden-layer neural network containing L hidden nodes can be represented as

\sum_{i = 1}^{L} β_{i} g (ω_{i} \cdot x_{j} + b_{i}) = o_{j}, (j = 1, 2, \dots, N)

(23)

where

g (\cdot)

is the activation function,

ω_{i} = {[ω_{i 1}, ω_{i 2}, \dots, ω_{i n}]}^{T}

is the weight vector connecting the ith hidden node and the input nodes,

b_{i}

is the threshold of the ith hidden node, and

β_{i} = {[β_{i 1}, β_{i 2}, \dots, β_{i m}]}^{T}

is the weight vector connecting the ith hidden node and the output nodes.

The aim of ELM is to minimise the error between the input vector

t_{j}

and the output vector

o_{j}

, which can be expressed as

\sum_{j = 1}^{N} ∥ o_{j} - t_{j} ∥ = 0 .

(24)

That is, there exist corresponding

w_{i}

,

x_{j}

, and

b_{i}

such that

\sum_{i = 1}^{L} β_{i} g (ω_{i} \cdot x_{j} + b_{i}) = t_{j}, (j = 1, 2, \dots, N)

(25)

which can be expressed as a matrix

H \cdot β = T,

(26)

where

H

is the output matrix of the implicit layer node,

β

is the output weight, and

T

is the desired output matrix.

2.5. Long Short-Term Memory (LSTM)

LSTM is a kind of recurrent neural network (RNN) that is effective at processing and predicting time series. LSTM alleviates the two major problems of gradient vanishing and gradient explosion in RNNs, making it more suitable for long-series prediction. The structure of LSTM is shown in Figure 3.

LSTM transmits information from

c_{t - 1}

to

c_{t}

, and the specific computation is divided into the following three gate structures:

(1) Forget gate: State information is screened and then selectively discarded using the following formula:

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f}),

(27)

where

b_{f}

is the forget gate;

σ (\cdot)

is the activation function;

W_{f}

is the weight matrix of the forget gate;

h_{t - 1}

and

x_{t}

denote the output and input matrices of the state unit at moments

t - 1

and t, respectively; and

b_{f}

is the threshold of the forget gate.

(2) Input gate: The input of new information is determined, including the update of the information and the content of the alternative update, which is calculated by the following formulas:

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i}),

(28)

{\tilde{C}}_{t} = t a n h (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{c}),

(29)

C_{t} = f_{t} \cdot C_{t - 1} + i_{t} \cdot {\tilde{C}}_{t},

(30)

where

i_{t}

and

{\tilde{C}}_{t}

are the input gate and input node,

W_{i}

and

W_{c}

are the weight matrices of the input gate and input node,

b_{i}

and

b_{c}

are the thresholds of the input gate and input node,

t a n h

is the activation function, and

C_{t}

and

C_{t - 1}

denote the states at moments t and

t - 1

, respectively.

(3) Output gate: The output is first determined through the sigmoid layer, and then the output is multiplied with the result after

t a n h

processing to obtain the specific formula, as follows:

O_{t} = σ (W_{o} [h_{t - 1}, x_{t}] + b_{o}),

(31)

h_{t} = o_{t} \cdot t a n h (C_{t}),

(32)

where

O_{t}

and

h_{t}

are the output gate and intermediate output, respectively;

W_{o}

is the weight matrix of the output gate; and

b_{o}

is the threshold of the output gate.

3. Feature Extraction and Non-Linear Integration Hybrid Prediction Model

3.1. The Framework of the Proposed Model

To address the non-stationary and non-linear characteristics of the carbon price, a new hybrid model, CEEMDAN-FuzzyEn-PSORF-ELM-LSTM-MLRRF-ARIMA model, is proposed. The model combines four core components, namely improved feature extraction, a hybrid prediction model, non-linear integration, and error correction. The framework of the proposed model is shown in Figure 4.

In the first part, the carbon price series is decomposed using CEEMDAN to extract several IMFs with smooth volatility and a residual. Subsequently, to reduce computational costs and improve prediction accuracy, components with similar complexity are reconstructed into three sub-sequences based on FuzzyEn. In the second part, PSORF, ELM, and LSTM are selected in the crucial prediction stage to capture the unique features of each component and to construct an appropriate model for each complexity component. In the third part, the predicted outcomes of the hybrid model are integrated utilising MLRRF to obtain the non-linear integration results. In the fourth part, the non-linear integration results are corrected for errors by applying ARIMA to obtain the final forecasting results.

3.2. Parameter Setting

In this paper, RF, ELM, and LSTM are selected as prediction models. In the three models, RF has two hyperparameters, which are the number of decision trees and the maximum depth. Both ELM and LSTM are single hidden-layer structures with 64 units, and the optimiser utilised is the adaptive moment estimation (Adam). The difference is that the activation function of ELM is a Rectified Linear Unit (ReLU), while that of LSTM is Sigmoid. The hyperparameter settings of model training are shown in Table 1.

3.3. Comparative Models

To illustrate the necessity and superiority of the proposed model, the model comparison is divided into four parts:

(1): Validate the need for sequence reconstruction. Here, the performance of the original carbon price sequence is compared with that of the reconstructed sequence based on FuzzyEn to highlight the importance of sequence reconstruction.
(2): Verify the demand for hybrid models. Evaluation metrics for single and hybrid prediction models compare high-, medium-, and low-complexity sequences to illustrate the ability of hybrid models to handle diverse data.
(3): Confirm the necessity of MLRRF integration. This is done by comparing the results obtained from simple summation integration with those subjected to MLRRF non-linear integration to emphasise the requirement for non-linear integration in the reconstruction of extremely complex and non-stationary sequences.
(4): Verify the need for error correction. The results without error correction are compared with the results of introducing an ARIMA model with error correction to highlight the criticality of error correction on the prediction results.

It is hoped that the above comparisons can more clearly demonstrate the necessity and superiority of the proposed hybrid model in carbon price forecasting.

3.4. Evaluation Indicators

3.4.1. Model Accuracy Evaluation

To evaluate the predictive effectiveness of the model, the following five indicators are applied in this paper: mean square error (MSE), mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (

R^{2}

).

M S E = \frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2},

(33)

M A E = \frac{1}{n} \sum_{i = 1}^{n} | {\hat{y}}_{i} - y_{i} |,

(34)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}},

(35)

M A P E = \sum_{i = 1}^{n} |\frac{{\hat{y}}_{i} - y_{i}}{y_{i}}| \times \frac{100 %}{n},

(36)

R^{2} = \frac{\sum_{i = 1}^{n} {(\hat{y_{i}} - \bar{y_{i}})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y_{i}})}^{2}},

(37)

where

\hat{y}

represents the predicted value, y denotes the original data,

\bar{y}

signifies the mean value of the original data, and n means the length of the test set data.

3.4.2. Model Significance Evaluation

The Diebold–Mariano test (DM test) was proposed by Diebold and Mariano [47] in 1995. The DM test is used to determine whether the prediction results of different models are statistically significant. Its main ideas are described below.

Suppose there is a true time series

y_{t} = {y_{1}, y_{2}, \dots, y_{t}}

with t sample points, and the forecasts of the true series based on models A and B are

{\hat{y}}_{t}^{(a)} = {{\hat{y}}_{1}^{(a)}, {\hat{y}}_{2}^{(a)}, \dots, {\hat{y}}_{t}^{(a)}}

and

{\hat{y}}_{t}^{(b)} = {{\hat{y}}_{1}^{(b)}, {\hat{y}}_{2}^{(b)}, \dots, {\hat{y}}_{t}^{(b)}}

, respectively. The prediction errors of the two models are

e_{i}^{(a)} = y_{i} - {\hat{y}}_{i}^{(a)}, i = 1, 2, \dots, t

(38)

e_{i}^{(b)} = y_{i} - {\hat{y}}_{i}^{(b)}, i = 1, 2, \dots, t

(39)

MAE is chosen as the loss function

F (e_{i}^{(j)}),

where

j = a, b

, and can be expressed as

F (e_{i}^{(j)}) = | e_{i}^{(j)} | .

(40)

The statistic of the DM test is expressed as

D M = \frac{\sum_{i = 1}^{t} d_{i} / t}{s},

(41)

where

d_{i} = F (e_{i}^{(a)}) - F (e_{i}^{(b)})

and s is the standard deviation of

d_{i}

.

The null hypothesis of the DM test is shown in Equation (42), which indicates that the two models have the same predictive performance, whereas the alternative hypothesis of the DM test, as given in Equation (43), states that the two models have different predictive performances

H_{0} : E (d_{i}) = 0,

(42)

H_{1} : E (d_{i}) \neq 0 .

(43)

Since the DM test statistic asymptotically follows a standard normal distribution, the absolute value of the DM statistic can be compared to

z_{α / 2}

. If the DM statistic satisfies Equation (44), the null hypothesis that there is a significant difference between the predictions of the different models will be rejected. Conversely, the null hypothesis that there is no significant difference between the two models will be accepted

|D M| > z_{α / 2}

(44)

where

z_{α / 2}

denotes the z-value in the table of the standard normal distribution at a level of significance of

α

.

4. Empirical Study

Accurate carbon price forecasts can effectively reduce carbon price volatility and thus facilitate the steady development of the carbon market. For example, if the carbon price is expected to rise sharply, management can reduce the volatility of the carbon price by increasing the number of allowances. In addition, a smooth carbon price also helps companies plan their resources, manage carbon market risks, and control their costs over the long term, rather than having to react to drastic changes in the carbon price. Therefore, it is important to develop highly accurate carbon price forecasting models.

4.1. Descriptive Statistics

Due to the Shenzhen and Hubei markets having larger sample sizes compared to other markets, this paper chooses these two markets as the research samples, which can advantageously support thorough training and guarantee the precision and generalisability of the model. Data on carbon prices for both the Shenzhen emission allowance (SZEA) and the Hubei emission allowance (HBEA) markets were obtained from iFinD (https://ft.10jqka.com.cn/, accessed on 16 January 2024). The time span of the SZEA is from 5 August 2013 to 8 January 2024, with a sample size of 2201. The time span of the HBEA is from 2 April 2014 to 15 January 2024, with a sample size of 2267. The main data characteristics of the study sample are detailed in Table 2.

Prior to the model forecast, the carbon price series is expanded using the time-lag method, and the flowchart of the expansion is shown in Figure 5. In this paper, the time step is set to 28, and the prediction type is a one-step-forward prediction, which means that the first 28 observations are used to predict the 29th observation.

Before prediction, each dataset is divided into a training set and a test set. The training set is used to train the model, and the test set is used to validate the model’s performance. In this paper, the number of samples in the training and test sets is 80% and 20%, respectively, of the total sample size. The trends of the SZEA and HBEA, as well as the frequency distributions of the average transaction price, are shown in Figure 6.

From the daily carbon price trend graph in Figure 6a,c and the frequency distribution histogram in Figure 6b,d, it is clear that the carbon price is non-linear and highly complex. Therefore, accurate prediction is not feasible directly, so decomposition must be performed before further processing.

4.2. Validation of the Necessity of Sequence Reconstruction

In order to understand the complexity of the IMFs generated by CEEMDAN more intuitively and accurately, we adopt FuzzyEn to describe their complexity. The original series of the SZEA and HBEA are decomposed using CEEMDAN, as depicted in Figure 7.

The results in Figure 7 suggest that both the SZEA and HBEA are decomposed into nine IMFs, and the degree of volatility of the decomposed series exhibits a decreasing trend. As mentioned above, FuzzyEn can measure the complexity of a time series. To improve prediction accuracy and computational efficiency, sub-sequences with similar FuzzyEn results are reorganised according to the fuzzy entropy theory. Comparisons of the FuzzyEn results of the original and reconstructed sequences are shown in Table 3.

According to the results in Table 3, the FuzzyEn results of the SZEA and HBEA are 0.8341 and 0.7463, respectively, so the SZEA has a higher degree of complexity than the HBEA. The high-complexity components are sorted according to the FuzzyEn results of the original carbon price series. For the SZEA, IMF1 (1.3239) and IMF2 (1.0932) are classified as high-complexity components. For the HBEA, IMF1 (0.7463), IMF2 (0.5486), and IMF3 (0.3294) are categorised as high-complexity components. The remaining components are ranked as medium- and low-complexity components based on the FuzzyEn results. For the SZEA, IMF3 (0.5987), IMF4 (0.3055), and IMF5 (0.1209) are considered medium-complexity components and reconstructed as Rec-sub3. IMF6 (0.0359), IMF7 (0.0053), IMF8 (0.0021), and IMF9 (0.0006) are divided into low-complexity components and reconstructed as Rec-sub4. For the HBEA, IMF4 (0.1296), IMF5 (0.0548), and IMF6 (0.0178) are considered medium-complexity components and reconstructed as Rec-sub4 (0.0917). IMF7 (0.0032), IMF8 (0.0015), and IMF9 (0.0001) are recomposed as low-complexity components and reconstructed as Rec-sub5 (0.0013). After the reconstruction of the sequences, the mean value of the FuzzyEn results was lowered from 0.8341 to 0.3486 for the SZEA and from 0.2305 to 0.1831 for the HBEA. Therefore, the complexity of the sequence is significantly decreased after reconstruction, which offers a sound basis for improving the accuracy of carbon price prediction. Comparisons of the FuzzyEn results of the original and reconstructed sequences are displayed in Figure 8.

The trends of the components of the SZEA and HBEA after reconstruction are illustrated in Figure 9.

It is clear that the complexity of the reconstructed sequence is significantly lower than that of the original sequence, making it easier for the model to capture the movement patterns of the series. However, a reduction in complexity does not necessarily improve prediction accuracy. Thus, to verify whether the reduction in complexity can improve prediction accuracy, the evaluation metrics of the original and reconstructed sequences are compared using three models, and the comparison results are shown in Table 4.

Table 4 demonstrates that for both the SZEA and HBEA across the three models, the MSE, MAE, RMSE, and MAPE all decrease after reconstruction. This indicates that sequence reconstruction increases prediction accuracy while effectively reducing the complexity of the sequence.

4.3. Validation of the Need for Hybrid Models

PSORF, ELM, and LSTM are applied to predict the reconstructed sequences. To reduce computational cost, an early stopping mechanism is added to PSO. If the model’s iteration accuracy does not improve after 25 rounds, the optimisation algorithm is stopped to retain the optimal result of this iteration round. The parameter settings of PSO are listed in Table 5. The convergence flow of PSORF training for the HBEA is illustrated in Figure 10. The MAE values of the three models for the SZEA and HBEA are shown in Table 6.

Table 6 presents the performance of PSORF, ELM, and LSTM in predicting sequences of different complexity. Overall, PSORF performs best on high-complexity sequences (SZEA-Rec-sub1, SZEA-Rec-sub2, HBEA-Rec-sub1, HBEA-Rec-sub2, and HBEA-Rec-sub3), with the exception of HBEA-Rec-sub3, but does not perform well on low-complexity sequences (SZEA-Rec-sub4, HBEA-Rec-sub4, and HBEA-Rec-sub5). ELM predicts better on sequences of medium complexity (SZEA-Rec-sub3 and HBEA-Rec-sub4). LSTM is more suitable for predicting low-complexity sequences such as SZEA-Rec4 and HBEA-Rec5, for which the MAE values of PSORF and ELM are 2.9656, 0.2642, 1.9009, and 0.1182, respectively. Therefore, PSORF, ELM, and LSTM are selected for high-, medium-, and low-complexity sequence predictions, respectively, and the prediction results are summed to obtain the hybrid model’s predicted results. The error comparison results for the single and hybrid models are presented in Table 7.

As shown in Table 7, the evaluation indicators of both the SZEA and HBEA hybrid models are smaller than those of the single prediction model. Specifically, the MSE, MAE, RMSE, and MAPE of the SZEA are 2.5579, 1.0693, 1.5994, and 0.0439, respectively. For the HBEA, these values are 0.2294, 0.3635, 0.4790, and 0.0079, respectively. This suggests that the hybrid model has a greater advantage in terms of forecasting error, which enables better fitting and prediction of highly volatile time series.

4.4. Validation of the Necessity of MLRRF Integration

This paper examines the intrinsic link between the original carbon price series and the reconstructed carbon price series and applies MLR to quantify the relationship between these series, followed by non-linear integration with RF. The summarised projections are presented in Table 8. For the SZEA, using MLRRF integration, the MSE, MAE, MAPE, and RMSE are 1.8411, 0.8675, 1.3569, and 0.0326, respectively, which represent significant reductions of 45.91%, 25.94%, 26.46%, and 34.23%. The results for the HBEA are similar and are not presented here. Hence, the prediction error of MLRRF integration is substantially reduced, making it more revealing of the relationship between the original and reconstructed sequences. A comparison of the error results for simple additive integration and MLRRF integration is displayed in Figure 11.

4.5. Verification of the Need for Error Correction

To further explore the application of error correction in carbon price forecasting, this paper uses ARIMA for error correction, and the confirmation criterion employed for the parameters of ARIMA is the Bayesian Information Criterion (BIC). According to the BIC, the parameters for both ARIMA models are set to (1, 1, 0). Error correction is performed using ARIMA for the non-linear integration results, and the error-corrected evaluation indicators are reported in Table 8.

According to the results in Table 8, the MSE, MAE, RMSE, and MAPE values for the SZEA after error correction are 0.3918, 0.4158, 0.6259, and 0.0163, respectively, representing reductions of 78.72%, 52.07%, 53.87%, and 49.88%, with an

R^{2}

value of 0.9991. For the HBEA, the MSE, MAE, RMSE, and MAPE values are 0.0429, 0.1382, 0.2072, and 0.0030, respectively, representing decreases of 72.62%, 50.38%, 47.68%, and 50.22%, with an

R^{2}

value of 0.9921. These outcomes show that error correction can be effective in improving the prediction accuracy of the model. For simplicity, each model is referred to as M1, M2, M3, etc., as shown in Table 9. The results of the overall model comparison between the SZEA and HBEA are illustrated in Figure 12, and the fitting results of each model are presented in Figure 13. As can be seen in Figure 12 and Figure 13, for all indicators and fitting levels, the M9 model is optimal for both regions.

4.6. Validation of the Proposed Model

In order to validate the proposed model, the results presented in this paper are compared with those of similar studies derived from excellent research about carbon price foresting, as shown in Table 10. From the table, it is clear that the proposed model is superior to the other models. Here, the increase in prediction accuracy is due to the improvements in feature extraction and non-linear integration.

To avoid the erroneous conclusion that the proposed model has a high degree of predictive accuracy due to random error, the model is statistically evaluated by applying the DM test, where the MAE is used as the loss function. The DM test results of each model are detailed in Table 11.

It can be seen that the p-value of each comparison model is less than 0.05, indicating that these models pass the DM test. Therefore, the models proposed in this paper are robust and can maintain good prediction performance in different scenarios.

5. Conclusions and Discussion

5.1. Conclusions

This paper establishes a forecasting framework based on improved feature extraction and non-linear integration to predict carbon prices. The contributions of this paper can be summarised as follows: (1) A feature extraction method that combines CEEMDAN and FuzzyEn is proposed, which can effectively extract features from the original carbon price series while optimising computational efficiency. The method makes the feature extraction process both accurate and efficient, promising to deliver more reliable predictions. (2) Based on the characteristics and complexity of the reconstructed sequences, the components of each complexity are predicted using targeted models. This design helps improve prediction quality because components of different complexity need to capture various characteristics, and the targeted model can better adapt to these characteristics to achieve higher precision. (3) This paper uses the non-linear integration learning method of MLRRF to reconstruct the sub-sequences, which can better reveal the relationship between the original and reconstructed sequences while effectively reducing prediction error. (4) By introducing error correction, the performance of the model achieves significant improvements in all evaluation indicators. Therefore, in practical applications, error correction has great potential to contribute to the improvement of the predictive performance of the models.

Carbon price forecasts can validly reflect the operating rules of the carbon market and provide a reference for the development of operational programmes and investor decision-making. Meanwhile, the results of the empirical analyses of the SZEA and HBEA also provide a theoretical basis for carbon pricing. Based on the results of the carbon price forecast, appropriate market operation and management strategies can be formulated to guide the investment direction of the carbon market.

5.2. Discussion

Focusing on the non-smooth and non-linear characteristics of the carbon price, this paper proposes a novel hybrid prediction model named CEEMDAN-FuzzyEn-PSORF-ELM-LSTM-MLRRF-ARIMA. It combines four core components: improved feature extraction, hybrid models, non-linear integration, and error correction. The experimental results show that the hybrid model is superior to traditional prediction methods in terms of prediction accuracy and robustness.

However, there are some limitations to this study. For example, it relies only on data-driven modelling and does not take into account external influences such as energy prices, investor sentiment, and climate change. In future work, effectively addressing the above shortcomings will contribute to improving the accuracy of non-stationary, non-linear time-series forecasting.

Author Contributions

Conceptualisation, Y.Z. and Y.C.; Data curation, Y.C.; Formal analysis, J.W.; Funding acquisition, Y.Z., Q.H., and Q.W.; Investigation, Q.H.; Methodology, Y.C.; Project administration, Y.Z. and Q.H.; Software, Y.C.; Validation, Y.Z. and Y.C.; Visualisation, Z.L.; Writing—original draft, Y.C.; Writing—review and editing, Y.G. and J.M. All authors were informed about each step of manuscript processing including submission, revision, revision reminders, etc., via emails. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the National Natural Science Foundation of China (No. 41701054), in part by the Graduate Teaching Case Establishment Project of Jilin Province, in part by the Project Grant for Teaching Cases of Graduate Students in Jilin Province (No. JJKH20230100YJG, No. JJKH20240715SK), and in part by the Jilin Provincial Department of Science and Technology (No. 20200201262JC).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data for both the SZEA and HBEA can be found at https://ft.10jqka.com.cn/ (accessed on 16 January 2024).

Acknowledgments

The authors appreciate the anonymous referees for their careful reading and various corrections that greatly improved the exposition of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

EU ETS	EU Emissions Trading System
NDRC	National Development and Reform Commission
ARIMA	Autoregressive integrated moving average model
GARCH	Autoregressive conditional heteroskedasticity model
ELM	Extreme learning machine
RF	Random forest
SVM	Support vector machine
AI	Artificial intelligence
ANNs	Artificial neural networks
CNNs	Convolutional neural networks
LSTM	Long short-term memory
GRUs	Gated recurrent unit networks
EMD	Empirical modal decomposition
WT	Wavelet transform
VMD	Variational modal decomposition
CEEMDAN	Complete ensemble empirical mode decomposition with adaptive noise
MLR	Multiple linear regression
MLRRF	Non-linear integration based on multiple linear regression and random forest
PSO	Partial swarm optimisation
PSORF	Improved random forest using particle swarm optimisation

Bagging	Bootstrap aggregating
IMFs	Intrinsic modal functions
ReLU	Rectified Linear Unit
Adam	Adaptive moment estimation
MSE	Mean square error
MAE	Mean absolute error
RMSE	Root mean square error
MAPE	Mean absolute percentage error
$R^{2}$	Coefficient of determination
SZEA	Shenzhen emission allowance
HBEA	Hubei emission allowance
DM test	Diebold–Mariano test
BIC	Bayesian Information Criterion

References

Li, M.; Bekö, G.; Zannoni, N.; Pugliese, G.; Carrito, M.; Cera, N.; Moura, C.; Wargocki, P.; Vasconcelos, P.; Nobre, P.; et al. Human metabolic emissions of carbon dioxide and methane and their implications for carbon emissions. Sci. Total Environ. 2022, 833, 155241. [Google Scholar] [PubMed]
Lin, B.; Jia, Z. Impacts of carbon price level in carbon emission trading market. Appl. Energy 2019, 239, 157–170. [Google Scholar]
Wang, J.; Sun, X.; Cheng, Q.; Cui, Q. An innovative random forest-based nonlinear ensemble paradigm of improved feature extraction and deep learning for carbon price forecasting. Sci. Total Environ. 2021, 762, 143099. [Google Scholar] [PubMed]
Zhao, X.; Han, M.; Ding, L.; Kang, W. Usefulness of economic and energy data at different frequencies for carbon price forecasting in the EU ETS. Appl. Energy 2018, 216, 132–141. [Google Scholar]
Boyce, J.K. Carbon pricing: Effectiveness and equity. Ecol. Econ. 2018, 150, 52–61. [Google Scholar]
Ji, C.J.; Hu, Y.J.; Tang, B.J.; Qu, S. Price drivers in the carbon emissions trading scheme: Evidence from Chinese emissions trading scheme pilots. J. Clean. Prod. 2021, 278, 123469. [Google Scholar]
Narassimhan, E.; Gallagher, K.S.; Koester, S.; Alejo, J.R. Carbon pricing in practice: A review of existing emissions trading systems. Clim. Policy 2018, 18, 967–991. [Google Scholar] [CrossRef]
Bompard, E.; Corgnati, S.; Grosso, D.; Huang, T.; Mietti, G.; Profumo, F. Multidimensional assessment of the energy sustainability and carbon pricing impacts along the Belt and Road Initiative. Renew. Sustain. Energy Rev. 2022, 154, 111741. [Google Scholar]
Wang, J.; Cui, Q.; He, M. Hybrid intelligent framework for carbon price prediction using improved variational mode decomposition and optimal extreme learning machine. Chaos Solitons Fractals 2022, 156, 111783. [Google Scholar] [CrossRef]
Qin, Q.; Huang, Z.; Zhou, Z.; Chen, Y.; Zhao, W. Hodrick–Prescott filter-based hybrid ARIMA–SLFNs model with residual decomposition scheme for carbon price forecasting. Appl. Soft Comput. 2022, 119, 108560. [Google Scholar]
Kour, M. Modelling and forecasting of carbon-dioxide emissions in South Africa by using ARIMA model. Int. J. Environ. Sci. Technol. 2023, 20, 11267–11274. [Google Scholar] [CrossRef]
Zhang, J.; Xu, Y. Research on the price fluctuation and risk formation mechanism of carbon emission rights in China based on a GARCH model. Sustainability 2020, 12, 4249. [Google Scholar] [CrossRef]
Huang, Y.; Dai, X.; Wang, Q.; Zhou, D. A hybrid model for carbon price forecasting using GARCH and long short-term memory network. Appl. Energy 2021, 285, 116485. [Google Scholar] [CrossRef]
Liu, S.; Zhang, Y.; Wang, J.; Feng, D. Fluctuations and Forecasting of Carbon Price Based on A Hybrid Ensemble Learning GARCH-LSTM-Based Approach: A Case of Five Carbon Trading Markets in China. Sustainability 2024, 16, 1588. [Google Scholar] [CrossRef]
Wang, P.; Liu, J.; Tao, Z.; Chen, H. A novel carbon price combination forecasting approach based on multi-source information fusion and hybrid multi-scale decomposition. Eng. Appl. Artif. Intell. 2022, 114, 105172. [Google Scholar] [CrossRef]
Zhou, J.; Chen, D. Carbon price forecasting based on improved CEEMDAN and extreme learning machine optimized by sparrow search algorithm. Sustainability 2021, 13, 4896. [Google Scholar] [CrossRef]
Qi, S.; Cheng, S.; Tan, X.; Feng, S.; Zhou, Q. Predicting China’s carbon price based on a multi-scale integrated model. Appl. Energy 2022, 324, 119784. [Google Scholar] [CrossRef]
Zhang, W.; Wu, Z.; Zeng, X.; Zhu, C. An ensemble dynamic self-learning model for multiscale carbon price forecasting. Energy 2023, 263, 125820. [Google Scholar] [CrossRef]
Zhao, H.; Guo, S. Carbon Trading Price Prediction of Three Carbon Trading Markets in China Based on a Hybrid Model Combining CEEMDAN, SE, ISSA, and MKELM. Mathematics 2023, 11, 2319. [Google Scholar] [CrossRef]
Zhang, X.; Yang, K.; Lu, Q.; Wu, J.; Yu, L.; Lin, Y. Predicting carbon futures prices based on a new hybrid machine learning: Comparative study of carbon prices in different periods. J. Environ. Manag. 2023, 346, 118962. [Google Scholar] [CrossRef]
Zhao, S.; Wang, Y.; Deng, G.; Yang, P.; Chen, Z.; Li, Y. An intelligently adjusted carbon price forecasting approach based on breakpoints segmentation, feature selection and adaptive machine learning. Appl. Soft Comput. 2023, 149, 110948. [Google Scholar] [CrossRef]
Sun, W.; Xu, C. Carbon price prediction based on modified wavelet least square support vector machine. Sci. Total Environ. 2021, 754, 142052. [Google Scholar] [CrossRef]
Sun, W.; Zhang, J. A novel carbon price prediction model based on optimized least square support vector machine combining characteristic-scale decomposition and phase space reconstruction. Energy 2022, 253, 124167. [Google Scholar] [CrossRef]
Li, J.; Liu, D. Carbon price forecasting based on secondary decomposition and feature screening. Energy 2023, 278, 127783. [Google Scholar] [CrossRef]
Acheampong, A.O.; Boateng, E.B. Modelling carbon emission intensity: Application of artificial neural network. J. Clean. Prod. 2019, 225, 833–856. [Google Scholar] [CrossRef]
Ahmad, Z.U.; Yao, L.; Lian, Q.; Islam, F.; Zappi, M.E.; Gang, D.D. The use of artificial neural network (ANN) for modeling adsorption of sunset yellow onto neodymium modified ordered mesoporous carbon. Chemosphere 2020, 256, 127081. [Google Scholar] [CrossRef] [PubMed]
Zhao, W.; Wu, Z.; Yin, Z.; Li, D. Attention-Based CNN Ensemble for Soil Organic Carbon Content Estimation with Spectral Data. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Shi, H.; Wei, A.; Xu, X.; Zhu, Y.; Hu, H.; Tang, S. A CNN-LSTM based deep learning model with high accuracy and robustness for carbon price forecasting: A case of Shenzhen’s carbon market in China. J. Environ. Manag. 2024, 352, 120131. [Google Scholar] [CrossRef]
Jiang, N.; Yu, X.; Alam, M. A hybrid carbon price prediction model based-combinational estimation strategies of quantile regression and long short-term memory. J. Clean. Prod. 2023, 429, 139508. [Google Scholar] [CrossRef]
Duan, K.; Wang, R.; Chen, S.; Ge, L. Exploring the predictability of attention mechanism with LSTM: Evidence from EU carbon futures prices. Res. Int. Bus. Financ. 2023, 66, 102020. [Google Scholar] [CrossRef]
Qin, C.; Qin, D.; Jiang, Q.; Zhu, B. Forecasting carbon price with attention mechanism and bidirectional long short-term memory network. Energy 2024, 299, 131410. [Google Scholar] [CrossRef]
Wen, Z.; Zhou, R.; Su, H. MR and stacked GRUs neural network combined model and its application for deformation prediction of concrete dam. Expert Syst. Appl. 2022, 201, 117272. [Google Scholar] [CrossRef]
Zhang, K.; Yang, X.; Wang, T.; Thé, J.; Tan, Z.; Yu, H. Multi-step carbon price forecasting using a hybrid model based on multivariate decomposition strategy and deep learning algorithms. J. Clean. Prod. 2023, 405, 136959. [Google Scholar] [CrossRef]
Sun, W.; Huang, C. A carbon price prediction model based on secondary decomposition algorithm and optimized back propagation neural network. J. Clean. Prod. 2020, 243, 118671. [Google Scholar] [CrossRef]
Zhang, C.; Zhao, Y.; Zhao, H. A novel hybrid price prediction model for multimodal carbon emission trading market based on CEEMDAN algorithm and window-based XGBoost approach. Mathematics 2022, 10, 4072. [Google Scholar] [CrossRef]
Zhou, F.; Huang, Z.; Zhang, C. Carbon price forecasting based on CEEMDAN and LSTM. Appl. Energy 2022, 311, 118601. [Google Scholar] [CrossRef]
Yang, R.; Liu, H.; Li, Y. An ensemble self-learning framework combined with dynamic model selection and divide-conquer strategies for carbon emissions trading price forecasting. Chaos Solitons Fractals 2023, 173, 113692. [Google Scholar] [CrossRef]
Liu, H.; Shen, L. Forecasting carbon price using empirical wavelet transform and gated recurrent unit neural network. Carbon Manag. 2020, 11, 25–37. [Google Scholar] [CrossRef]
Wang, J.; Cui, Q.; Sun, X. A novel framework for carbon price prediction using comprehensive feature screening, bidirectional gate recurrent unit and Gaussian process regression. J. Clean. Prod. 2021, 314, 128024. [Google Scholar] [CrossRef]
Liu, H.; Pata, U.K.; Zafar, M.W.; Kartal, M.T.; Karlilar, S.; Caglar, A.E. Do oil and natural gas prices affect carbon efficiency? Daily evidence from China by wavelet transform-based approaches. Resour. Policy 2023, 85, 104039. [Google Scholar] [CrossRef]
Liu, W.; Wang, C.; Li, Y.; Liu, Y.; Huang, K. Ensemble forecasting for product futures prices using variational mode decomposition and artificial neural networks. Chaos Solitons Fractals 2021, 146, 110822. [Google Scholar] [CrossRef]
Li, G.; Wu, H.; Yang, H. A hybrid forecasting model of carbon emissions with optimized VMD and error correction. Alex. Eng. J. 2023, 81, 210–233. [Google Scholar] [CrossRef]
Zhang, J.; Chen, K. Research on carbon asset trading strategy based on PSO-VMD and deep reinforcement learning. J. Clean. Prod. 2024, 435, 140322. [Google Scholar] [CrossRef]
Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 1998, 454, 903–995. [Google Scholar] [CrossRef]
Torres, M.E.; Colominas, M.A.; Schlotthauer, G.; Flandrin, P. A complete ensemble empirical mode decomposition with adaptive noise. In Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May 2011; pp. 4144–4147. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Diebold, F.X.; Mariano, R.S. Com paring predictive accu racy. J. Bus. Econ. Stat. 1995, 13, 253–263. [Google Scholar] [CrossRef]
Niu, X.; Wang, J.; Zhang, L. Carbon price forecasting system based on error correction and divide-conquer strategies. Appl. Soft Comput. 2022, 118, 107935. [Google Scholar] [CrossRef]
Wang, J.; Zhuang, Z.; Gao, D. An enhanced hybrid model based on multiple influencing factors and divide-conquer strategy for carbon price prediction. Omega 2023, 120, 102922. [Google Scholar] [CrossRef]

Figure 1. Flowchart of PSORF algorithm.

Figure 2. Topology of an extreme learning machine.

Figure 3. Structure of LSTM algorithm.

Figure 4. Framework of the proposed model.

Figure 5. The expansion flowchart.

Figure 6. Trends and frequency distributions for two regions. (a) Trend of SZEA; (b) Frequency distribution of SZEA; (c) Trend of HBEA; (d) Frequency distribution of HBEA.

Figure 7. Decomposition results for two regions. (a) Decomposition results for SZEA; (b) Decomposition results for HBEA.

Figure 8. FuzzyEn comparison results. (a) FuzzyEn of IMFs for SZEA; (b) FuzzyEn of reconstructed sequence for SZEA; (c) FuzzyEn of IMFs for HBEA; (d) FuzzyEn of reconstructed sequence for HBEA.

Figure 9. Results of the reconstruction of the two regions. (a) Results of the reconstruction of SZEA; (b) Results of the reconstruction of HBEA.

Figure 10. The convergence flow of PSORF training. (a) HBEA; (b) Rec-sub1; (c) Rec-sub2; (d) Rec-sub3; (e) Rec-sub4; (f) Rec-sub5.

Figure 11. Results of the MLRRF integration error comparison.

Figure 12. Results of error comparison. (a) SZEA; (b) HBEA.

Figure 13. Results of model fitting. (a) SZEA; (b) HBEA.

Table 1. Parameters of forecasting models.

Model	Parameters	Parameter Settings
RF	The number of decision trees	46
RF	The maximum depth	97
ELM	The number of hidden layers	1
	The number of units	64
	Optimiser	Adam
	Activation function	ReLU
LSTM	The number of hidden layers	1
	The number of units	64
	Batch size	32
	Loss function	MAE
	Optimiser	Adam
	Activation function	Sigmoid

Table 2. The main numerical characteristics of the research data.

Research Data	Time Span	Size	Mean	Max.	Min.	Std.
SZEA	5 August 2013–8 January 2024	2201	35.004	122.970	1.480	20.350
HBEA	2 April 2014–15 January 2024	2267	29.184	61.890	10.070	11.259

Table 3. The FuzzyEn results of each sub-sequence and reconstruction consequence.

Samples	CEEMDAN Subsequences	FuzzyEn	Reconstruction Subsequences	FuzzyEn
SZEA	SZEA	0.8341	Rec-SZEA	0.3486
	IMF1	1.3239	Rec-sub1	1.3239
	IMF2	1.0932	Rec-sub2	1.0932
	IMF3	0.5987	Rec-sub3	0.5497
	IMF4	0.3055
	IMF5	0.1209
	IMF6	0.0359	Rec-sub4	0.0138
	IMF7	0.0053
	IMF8	0.0021
	IMF9	0.0006
HBEA	HBEA	0.2305	Rec-HBEA	0.1831
	IMF1	0.7463	Rec-sub1	0.7463
	IMF2	0.5486	Rec-sub2	0.5486
	IMF3	0.3294	Rec-sub3	0.3294
	IMF4	0.1296	Rec-sub4	0.0917
	IMF5	0.0548
	IMF6	0.0178
	IMF7	0.0032	Rec-sub5	0.0013
	IMF8	0.0015
	IMF9	0.0001

Table 4. Error comparison between original and reconstructed sequences.

Sample	Model	MSE	MAE	RMSE	MAPE
SZEA	PSORF	37.5638	4.9259	6.1289	0.1429
	PSORF-Rec	18.3297	3.6412	4.2813	0.0926
	ELM	10.6114	2.0560	3.2575	0.0897
	ELM-Rec	3.5194	1.2333	1.8760	0.0491
	LSTM	10.5641	2.0449	3.2502	0.0926
	LSTM-Rec	4.7292	1.4438	2.1747	0.0667
HBEA	PSORF	5.1855	1.9728	2.2772	0.0426
	PSORF-Rec	4.8768	1.9315	2.2084	0.0411
	ELM	0.8579	0.6458	0.9262	0.0140
	ELM-Rec	0.3193	0.4205	0.5651	0.0091
	LSTM	0.8997	0.6694	0.9485	0.0147
	LSTM-Rec	0.2827	0.3874	0.5317	0.0084

Table 5. Parameter setting of PSO.

Hyperparameter	Meaning of the Parameter	Parameter Settings
N	Population size	30
D	Dimension	2
W	Inertia weight	[0, 1)
$C_{1}$	Individual learning factor	2
$C_{2}$	Group learning factor	2
$r_{1}$	Acceleration coefficients	0.7
$r_{2}$	Acceleration coefficients	0.5
M	Maximum iterations	100
$L o s s$	Loss function	MAE
$e a r l y s t o p p i n g r o u n d s$	Early stopping rounds	25
$n_e s t i m a t o r s_r a n g e$	Range of decision trees	[20, 150)
$m a x_d e p t h_r a n g e$	Maximum depth range	[20, 150)

Table 6. The MAE values for the three models for the two regions.

Model	SZEA				HBEA
Model	Rec-sub1	Rec-sub2	Rec-sub3	Rec-sub4	Rec-sub1	Rec-sub2	Rec-sub3	Rec-sub4	Rec-sub5
PSORF	0.9080	0.4172	0.5114	2.9656	0.3109	0.1659	0.0849	0.0576	1.9009
ELM	1.0391	0.4438	0.4273	0.2642	0.3615	0.1706	0.0723	0.0504	0.1182
LSTM	1.0954	0.5359	0.8128	0.2290	0.3586	0.1778	0.0564	0.0668	0.0878

Table 7. Results of error comparison between single and hybrid models.

Model	SZEA				HBEA
Model	MSE	MAE	RMSE	MAPE	MSE	MAE	RMSE	MAPE
PSORF-Rec	18.3297	3.6412	4.2813	0.0926	4.8768	1.9315	2.2084	0.0411
ELM-Rec	3.5194	1.2333	1.8760	0.0491	0.3193	0.4205	0.5651	0.0091
LSTM-Rec	4.7292	1.4438	2.1747	0.0667	0.2827	0.3874	0.5317	0.0084
PSORF-ELM-LSTM	2.5579	1.0693	1.5994	0.0439	0.2294	0.3635	0.4790	0.0079

Table 8. Model comparison results.

Model	SZEA				HBEA
Model	MSE	MAE	RMSE	MAPE	MSE	MAE	RMSE	MAPE
PSORF-Rec	18.3297	3.6412	4.2813	0.0926	4.8768	1.9315	2.2084	0.0411
ELM-Rec	3.5194	1.2333	1.8760	0.0491	0.3193	0.4205	0.5651	0.0091
LSTM-Rec	4.7292	1.4438	2.1747	0.0667	0.2827	0.3874	0.5317	0.0084
PSORF-ELM-LSTM	2.5579	1.0693	1.5994	0.0439	0.2294	0.3635	0.4790	0.0079
PSORF-ELM-LSTM-MLRRF	1.8411	0.8675	1.3569	0.0326	0.1568	0.2786	0.3960	0.0061
PSORF-ELM-LSTM-MLRRF-ARIMA	0.3918	0.4158	0.6259	0.0163	0.0429	0.1382	0.2072	0.0030

Table 9. Code name of each model.

Name	Model
M1	PSORF
M2	PSORF-Rec
M3	ELM
M4	ELM-Rec
M5	LSTM
M6	LSTM-Rec
M7	PSORF-ELM-LSTM
M8	PSORF-ELM-LSTM-MLRRF
M9	PSORF-ELM-LSTM-MLRRF-ARIMA

Table 10. Comparison results with similar studies.

Datasets	Model	MSE	MAE	RMSE	MAPE
SZEA	CNN-LSTM [28]	2.7352	1.0650	1.6538	-
	ICEEMDAN-LZC-DWT-SVR-MLP [24]	0.7811	0.7811	1.2725	4.1516
	ECDC-VMD-MOGOA-ORELM [48]	1.1772	0.8109	1.0850	0.0278
	SSA-FFT-XGboost-PACF-SMA-BiLSTM [49]	0.4127	0.4727	0.6424	0.0235
	The proposed model (M9)	0.3918	0.4158	0.6259	0.0163
HBEA	IVMD-MSE-SSA-ELM [9]	0.4426	0.4309	0.6653	1.8268
	CEEMDAN-SE-QRLSTM [29]	0.1500	0.3100	0.3900	0.1000
	ICEEMDAN-MS-At-Bi-LSTM [31]	0.3399	-	0.5830	-
	SCDAF [21]	0.0784	0.1475	0.2800	0.5884
	The proposed model (M9)	0.0429	0.1382	0.2072	0.0030

Table 11. DM test results of each model.

Target Model	Comparison Models	SZEA		HBEA
Target Model	Comparison Models	DM Statistics	p-Value	DM Statistics	p-Value
M9	M1	7.0821	0.0000	11.2553	0.0000
	M2	4.8627	0.0000	5.0646	0.0000
	M3	9.9325	0.0000	12.7747	0.0000
	M4	5.6648	0.0000	7.4197	0.0000
	M5	9.1497	0.0000	12.7747	0.0000
	M6	6.1919	0.0000	8.1461	0.0000
	M7	10.1754	0.0000	11.0213	0.0000
	M8	5.9110	0.0000	9.5979	0.0000

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, Y.; Chen, Y.; Hua, Q.; Wang, J.; Guo, Y.; Li, Z.; Ma, J.; Wei, Q. A Hybrid Model for Carbon Price Forecasting Based on Improved Feature Extraction and Non-Linear Integration. Mathematics 2024, 12, 1428. https://doi.org/10.3390/math12101428

AMA Style

Zhu Y, Chen Y, Hua Q, Wang J, Guo Y, Li Z, Ma J, Wei Q. A Hybrid Model for Carbon Price Forecasting Based on Improved Feature Extraction and Non-Linear Integration. Mathematics. 2024; 12(10):1428. https://doi.org/10.3390/math12101428

Chicago/Turabian Style

Zhu, Yingjie, Yongfa Chen, Qiuling Hua, Jie Wang, Yinghui Guo, Zhijuan Li, Jiageng Ma, and Qi Wei. 2024. "A Hybrid Model for Carbon Price Forecasting Based on Improved Feature Extraction and Non-Linear Integration" Mathematics 12, no. 10: 1428. https://doi.org/10.3390/math12101428

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Model for Carbon Price Forecasting Based on Improved Feature Extraction and Non-Linear Integration

Abstract

1. Introduction

2. Models and Methods

2.1. Completely Adaptive Ensemble Empirical Modal Decomposition of Noise (CEEMDAN)

2.2. Fuzzy Entropy (FuzzyEn)

2.3. Improved Random Forest Using Particle Swarm Optimisation (PSORF)

2.3.1. Partial Swarm Optimisation (PSO)

2.3.2. Random Forest (RF)

2.4. Extreme Learning Machine (ELM)

2.5. Long Short-Term Memory (LSTM)

3. Feature Extraction and Non-Linear Integration Hybrid Prediction Model

3.1. The Framework of the Proposed Model

3.2. Parameter Setting

3.3. Comparative Models

3.4. Evaluation Indicators

3.4.1. Model Accuracy Evaluation

3.4.2. Model Significance Evaluation

4. Empirical Study

4.1. Descriptive Statistics

4.2. Validation of the Necessity of Sequence Reconstruction

4.3. Validation of the Need for Hybrid Models

4.4. Validation of the Necessity of MLRRF Integration

4.5. Verification of the Need for Error Correction

4.6. Validation of the Proposed Model

5. Conclusions and Discussion

5.1. Conclusions

5.2. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI