An Optimized Fractional Grey Prediction Model for Carbon Dioxide Emissions Forecasting

Because grey prediction does not demand that the collected data have to be in line with any statistical distribution, it is pertinent to set up grey prediction models for real-world problems. GM(1,1) has been a widely used grey prediction model, but relevant parameters, including the control variable and developing coefficient, rely on background values that are not easily determined. Furthermore, one-order accumulation is usually incorporated into grey prediction models, which assigns equal weights to each sample, to recognize regularities embedded in data sequences. Therefore, to optimize grey prediction models, this study employed a genetic algorithm to determine the relevant parameters and assigned appropriate weights to the sample data using fractional-order accumulation. Experimental results on the carbon dioxide emission data reported by the International Energy Agency demonstrated that the proposed grey prediction model was significantly superior to the other considered prediction models.


Introduction
The International Energy Agency (IEA) [1] reported that carbon dioxide (CO 2 ), whose gross emissions in 2019 were 33 billion tons across the globe, is a greenhouse gas that has a direct effect on climate change. There were causes that gave rise to CO 2 emissions going down in 2019, including the increased consumption of renewable energy in advanced economies such as the European Union, the US, and Japan, and the slowed-down economic growth of emerging markets. Even so, an increase of CO 2 emissions arising from global economic growth may be still inevitable [1]. Indeed, Asian countries have contributed a huge amount of CO 2 emissions via fuel combustion. To reduce the negative impact of CO 2 emissions on the environment and economic growth, it is necessary for national authorities to leverage information derived from forecasting CO 2 emissions to devise energy-development policies. For instance, China is the highest carbon-emitting country and is facing tremendous pressure to reduce carbon emissions. CO 2 emissions are expected to go down by 18% in 2020 due to China's "13th Five-Year Plan" [2].
Many methods, including artificial intelligence techniques such as evolutionary algorithms [3,4], neural networks (NNs) [5,6], and statistical methods such as logistic equations [7], regression models [4,7,8], time series models [9], and the ARIMA model [4,10], have been frequently applied to forecasting. However, the forecasting accuracy of artificial intelligence techniques can be influenced significantly by the training sample size [11], and statistical methods usually require a large amount of data that conform to some statistical assumptions [12]. The grey prediction model was constructed to avoid the inherent flaws arising from the statistical analysis [13]. Grey prediction needs limited data, and it does not demand the collected data to be in line with any statistical properties [13,14]. Grey system theory is an artificial intelligence technique [15], and GM(1,1) has been one of the most commonly used prediction models [13,14,16] among grey systems.
Among the diverse applications of real-world problems analyzed by grey prediction, CO 2 emissions forecasting is an important issue. GM(1,1) and its variants have been widely applied to forecast CO 2 emissions, such as the original GM(1,1) by Lin et al. [17], the nonlinear grey Bernoulli model (NGBM(1,1)) by Pao et al. [18], the grey Verhulst model by Wang and Li [19], and the adaptive grey model by Xu et al. [20]. In addition to GM(1,1), the multivariate model GM(1,N), comprised of N-1 relevant factor sequences and a characteristic sequence, is often applied to CO 2 emission forecasting, such as the nonlinear multivariable models of Wang and Ye [21] and Wu et al. [22] and the grey multivariable model based on the trends of driving variables of Ding et al. [23]. Compared with GM(1,1), the motivation of using GM(1,N) arises from the fact that multivariate techniques may improve forecasting ability [24,25]. Despite the usefulness of grey prediction, several issues arising from the above-mentioned grey prediction models motivated us to further develop an effective grey prediction model for forecasting CO 2 emissions.
First, the performance of multiple variables models, such as econometric methods, can be adversely affected if we know little about the relevant explanatory factors [26]. Despite CO 2 emissions being mainly influenced by economic development, population, and energy use [23], we do not know all of the relevant factors. This led us to consider GM(1,1) as a development base in this study, rather than GM(1,N). Next, when using GM(1,1) or one of its variants, potential regularities embedded in data sequences are usually recognized by the one-order accumulated generating operation (1-AGO). A problem arising from 1-AGO is that each sample is treated with equal weighting [27]. Thus, it is pertinent to consider FAGM(1,1) by incorporating fractional-order accumulation into GM(1,1) to mitigate such a restriction [23,24]. Several variants have been proposed to strengthen the FAGM(1,1), such as fractional NGBM(1,1) (FANGBM(1,1)) by Wu et al. [22] andŞahin [28] and fractional GM(q,1) by Mao et al. [29]. Despite the usefulness of fractional-order accumulation, limited studies related to grey prediction, apart from the fractional time-delayed grey model (FTDGM) of Ma et al. [30], the nonhomogeneous grey model of Wu et al. [31], and the discrete fractional GM(1,1) by Gao et al. [32], have addressed CO 2 emission forecasting using FAGM(1,1).
The last issue we address here is that FAGM(1,1) and its variants usually require application of the ordinary least square (OLS) method to derive the control variable and developing coefficient by means of background values, which are not easily determined [33,34]. Thus, this study proposes a genetic algorithm (GA)-based fractional grey prediction model (GA-FAGM(1,1)) to determine relevant parameters without background values. The usefulness and applicability of GA-FAGM(1,1) are verified via its application to annual CO 2 emission forecasting. Compared with the other considered prediction models, the results demonstrate that GA-FAGM(1,1) performs well.
The remainder of this paper is organized as follows. Section 2 introduces the original GM(1,1) and its fractional version. The proposed GA-FAGM(1,1) is described in Section 3. Section 4 examines the CO 2 emissions forecasting accuracy of the different considered prediction models. A discussion and conclusions are presented in Section 5.  (1) n ), which is further derived by 1-AGO as Since x (1) is monotonically increasing, the whitening equation, which is treated as a mathematical form of the GM(1,1), is expressed as where a is the developing coefficient and b is the control variable. The corresponding time response function is obtained by solving the whitening equation aŝ It turns out that a linear regression model can be used to estimate a and b as where z (1) = (z (1) k (k = 2, 3, . . . , n) is given by where α is set to 0.5 commonly. OLS can then be applied to obtain the parameters a and b: which accounts for why z (1) has a strong impact on the determination of a and b. At last, the one-order inverse AGO (1-IAGO) is applied to compute the predicted value of x However, the above linear regression model might not follow the Gauss-Markov theorem [35]; thereby, the resultant estimators obtained by OLS may not be the best unbiased estimators, which suggests thatx (0) 1 (k) may be unreliable.

Fractional GM(1,1) Model
GM p (1,1) is a form of GM(1,1) that has been combined with a fractional-order accumulator, where p is the fractional parameter (0 < p < 1). An accumulated generating sequence, n ), with p-order is generated by p-AGO as: where For instance, provided that n = 4 and p = 0.8, the coefficient of is computed as It has been proven that p-AGO satisfies the so-called principle of new information priority when 0 < p < 1. The smaller the value of p, the smaller weights older data are assigned [22,27,36]. That is, newer data are more weighted when a smaller value of p is given. The , a whitening equation with respect to GM p (1,1), is given byx The original form is expressed as: where α is usually set to 0.5. OLS can be applied to derive a (p) and b (p) in the case where α and p are given Furthermore, it is clear that z (p) determines a (p) and b (p) . In common with GM(1,1), we cannot guarantee that the linear regression model follows the Gauss-Markov theorem. To

The Proposed Optimized Grey Prediction Model
Althoughx (p) k in the original FAGM(1,1) can be obtained by the relevant parameters, including the fractional parameter, p, the developing coefficient, a (p) , and the control variable, b (p) , it is not required to apply OLS to derive a (p) and b (p) to avoid the problems arising from the Gauss-Markov theorem. This leads us to find relevant parameters of FAGM(1,1) using a GA instead of OLS. A flowchart of constructing the proposed GA-FAGM(1,1) is depicted in Figure 1.

Problem Formulation
To develop an optimized prediction model, the mean absolute percentage error (MAPE) is used to formulate the objective of our problem as where x k andx k are the actual and forecasted values at time k (k = 1, 2, . . . , n), respectively. MAPE has become a benchmark to evaluate prediction accuracy since it has been proven that MAPE is more stable than other commonly used measures, including the root mean square error (RMSE) and mean absolute error (MAE) [37]. Dang et al. [38] demonstrated the effectiveness of MAPE when constructing optimized grey prediction models as well.
We thus use MAPE to evaluate the fitness of a chromosome.

Coding
Three required parameters (i.e., a (p) , b (p) , and p) can be discovered by a GA. A chromosome consisting of a (p) , b (p) , and p in a population corresponds to GA-FAGM(1,1) such that smaller fitness values produce better chromosomes. To align with the new information priority principle, p ranging from zero to 1 is considered because p-AGO can discriminate in favor of older data as p > 1.

Genetic Operations
Let n max and n size denote the maximum number of generations and the population size, respectively. Selection, crossover, and mutation are applied to generate n size new chromosomes for P m+1 after the fitness value of each string has been evaluated for P m , where P m denotes a population in the m-th generation (1 ≤ m ≤ n max ). When P m is treated as the current population, P m+1 is the next population for P m .

Selection
Two strings are selected randomly from P m by binary tournament selection with replacement to generate new strings in P m+1 . The string with higher fitness is thereby put in the mating pool. We end the whole learning process when n size strings have been placed in the mating pool.

Crossover and Mutation
From the mating pool, the parent strings, say u (p m u a , are selected, and crossover and mutation are applied to reproduce children. The crossover ends up with a generation of offspring including u (p m u a v ) by employing each pair of parameters in u and v with crossover rate Pr c as: where α 1 , α 2 , and α 3 are random numbers in the unit interval. In a newly generated string, a tiny positive or minus value, is added to alter a parameter with mutation rate Pr m . A higher value of Pr c is often recommended because it benefits the exploration of more solution space. Furthermore, Pr m should be set to a lower value to prevent the evolution from excessive perturbations [39].

Elitist Strategy
For P m+1 , the elitist strategy intends to retain strings with high fitness from P m . Indeed, only a few elite strings are enough to generate good results [39]. In P m , a string with a minimum fitness serves as an elite string. n del (0 ≤ n del ≤ n size ) strings can be eliminated from P m+1 randomly, and the elite strings are then added to P m+1 n size times.

Algorithm Design
The proposed GA-FAGM(1,1) is set up by employing the GA to optimize the relevant parameters. The pseudocode corresponding to the GA for constructing the GA-FAGM(1,1) is described as follows: m←1; Generate n size strings in P m ; //Initialization// while m < n max do Compute the fitness value of each string in P m ; Choose the strings with top n del fitness to be elites; repeat Randomly choose two strings from P m ; Put the string with higher fitness in the mating pool; until n size strings in the mating pool; repeat Select the parent strings u and v from the mating pool; Perform crossover to generate new parameters; //Equations (17)-(19)// Add a tiny positive/negative value to each new parameter; //mutation// Add offspring u and v to P m+1 ; until n size strings in P m+1 ; Randomly remove n del strings from P m+1 ; Add n del elites to P m+1 ; //elitist strategy// m←m + 1; end.

Applications of CO 2 Emissions Forecasting
Reduction of the negative impact of emissions of CO 2 on the global environment and economic growth is urgently needed. One way to achieve this goal is to develop prediction models with high accuracy for forecasting CO 2 emissions; indeed, such an approach has become increasingly important for national authorities.

Comparative Prediction Models
Here, the GA-FAGM(1,1) is compared to the other considered grey prediction models as follows: (1) GM(1,1): To find the optimal α, the Linear Interactive and General Optimizer is applied to set up an optimized GM(1,1) with OLS by minimizing MAPE [38].
(3) FANGBM(1,1) [2]: A p-order differential equation serves as the mathematical form of FANGBM(1,1), which is given by: The time response function iŝ As α, p, and r are given, OLS is applied to derive a (p) and b (p) . The optimal values of α, p, and r are thereby determined by a GA with the minimization of MAPE.
(4) FTDGM [30]: The whitening equation of FTDGM is given by: The time response function iŝ where As α and p are given, OLS is applied to derive a (p) and b (p) . Using α = 0.5, a GA is employed to determine the optimal value of p that can minimize MAPE.
To construct the optimized versions of FAGM(1,1), FANGBM(1,1), FTDGM, and GA-FAGM(1,1), GAs are implemented for individual grey prediction models to find the relevant parameters optimally, in which n max , n size , p c , and p m are set to 1000, 200, 0.9, and 0.01, respectively. For instance, α, p, and r relevant to FANGBM(1,1) can be automatically determined by a GA. Commonly used forecasting models, including NNs, fuzzy time series analysis (FTS), and the ARIMA model, were considered as well. The related parameter specifications for training a NN included one hidden layer with five hidden nodes, a learning rate of 0.5, and ten thousand repetitions. The average prediction accuracy of a NN was computed after performing ten independent trials.
As for the FTS analysis, the computational steps are briefly introduced as follows [40]: (1) On the basis of the minimum and maximum values of the available data, a universe of discourse U is defined. Then, U is equally divided up into s subintervals using s+1 partitioning points (p 1 , p 2 , . . . , p s+1 ).
(3) To generate an FTS denoted by F(t), x (4) Let F(t − 1)→F(t) denote a fuzzy logical relationship F(t−1)→F(t), which means that F(t) is caused by F(t − 1). In the FTS, we employ A r →A q (v) to represent the case where A r →A q (1 ≤ q, r ≤ s + 1) appears v times.
(5) In the case of F(t − 1) = A r , and at least two fuzzy logical relationships, The predicted value is 0.5(x r + x ), where x r and x can be expressed, as appears v times where m r is the midpoint of interval (p r , p r+1 ). The predicted value is 0.5 (x r + x q ), as only A r →A q is available. However, the predicted value is x r when all FLR at hand are not available for forecasting.

Experimental Results
To examine the forecasting accuracy of the different prediction models considered here, according to statistics from the IEA [1], countries whose total amount of CO 2 emissions ranked among the top 20 since 2000 were taken into account.
The historical data shown in Table 1 span from 2003 to 2017. This study employs data from 2003 to 2013 for the model fitting and employs the other data for ex-post testing. Table 2 summarizes the results of the ex-post testing. Compared with the other considered prediction models, GA-FAGM(1,1) performed well. Moreover, the proposed GA-FAGM(1,1) performed best for 11 out of 20 data sequences.  To examine the differences among the considered prediction models, the Friedman test with a post-hoc test, namely the Nemenyi test, was employed to statistically analyze the eight prediction models applied to the 20 datasets. Let r j denote the average rank of prediction model j (j = 1, 2, . . . , 8). As seen in Table 2, r 1 = 5.3, r 2 = 5.75, r 3 = 5.1, r 4 = 6.2, r 5 = 4.75, r 6 = 4.05, r 7 = 3.15, and r 8 = 1.7 were obtained for GM(1,1), FAGM(1,1), FANGBM(1,1), FTDGM, NN, ARIMA, FTS, and GA-FAGM(1,1), respectively. The smaller the average rank, the better the forecasting model performed.
Let k 1 be the number of prediction models considered and k 2 be the number of data sequences used. The null hypothesis claims that the ranks of the considered prediction models are, on average, identical. As a result, the null hypothesis is rejected since the Friedman statistic is 10.98 that exceeds the critical value of F(k 1 − 1, (k 1 − 1)(k 2 − 1)) (2.08) at the 5% level.
The Nemenyi test was further employed to detect differences by the critical difference (CD) expressed as CD = 3.03 k 1 (k 1 + 1) 6k 2 (27) where CD equals 2.35 at the 5% level. This means that a prediction model is significantly superior to another model in the case that the average rank of the latter is less than that of the former by CD. The results are summarized below: (1) The proposed GA-FAGM(1,1) had the minimum rank on average and significantly outperformed the other prediction models except for FTS.
(2) Despite the fact that GA-FAGM(1,1) did not significantly outperform FTS, the rank of the former was smaller than the latter. The results showed that the latter performed worse than the former for 15 out of 20 data sequences.
(3) Besides GA-FAGM(1,1), the optimized GM(1,1) was not significantly inferior to the other considered prediction models. Note that GM(1,1) is often treated as a benchmark when comparisons were made among different grey prediction models.
(4) Although FAGM(1,1) was not significantly superior to GM(1,1), it was interesting to note that the average rank of the former was smaller than that of the latter.
The results found here indicated that the proposed GA-FAGM(1,1) is applicable to other prediction problems as well, such as energy and tourism demand forecasting. Indeed, when endeavoring to set up development plans for energy demand and consumption, the prediction of energy demand has become increasingly noteworthy for government administrations [41], especially in developing countries [42]. Moreover, it has been shown that the residual GM(1,1) improved the forecasting accuracy of GM(1,1) [13,14]. In a similar way, it is possible to develop the residual FAGM(1,1) to improve FAGM(1,1). It turns out that how to construct the residual GA-FAGM(1,1) to improve GA-FAGM(1,1) can be an interesting issue. Additionally, despite incomplete information with respect to relevant factors for the prediction problems with which we are concerned, it could be worth constructing a multivariate grey prediction model by extending GA-FAGM(1,1) to GA-FAGM(1,N). These remain the focus of future work.

Conclusions
In light of the effectiveness and applicability of grey prediction for forecasting CO 2 emissions, the development of such models appears to be profitable. In the case of leveraging prediction models for CO 2 emissions, it is helpful for authorities to set up competitive strategies for economic growth and environmental protection by inhibiting CO 2 emissions.
This study highlighted the usefulness of GA-FAGM(1,1) by incorporating three significant characteristics into the proposed grey prediction model, including the use of single variable model as a development base, fractional order accumulation, and determination of the control variable and developing coefficient without background values. These features make the GA-FAGM(1,1) novel compared to the other fractional grey prediction models considered here.
From Table 2, it can be found that the forecasting accuracy of the proposed GA-FAGM(1,1) for CO 2 emission forecasting was quite encouraging. With the GA, the proposed GA-FAGM(1,1) significantly outperformed the other grey prediction models considered here. We thereby conclude that the mechanism of determining the relevant parameters makes the proposed prediction model perform significantly better. The experimental results also emphasize the applicability and usefulness of GA-FAGM(1,1) in terms of forecasting CO 2 emissions.