Nonadditive Grey Prediction Using Functional-Link Net for Energy Demand Forecasting

: Energy demand prediction plays an important role in sustainable development. The GM(1,1) model has drawn our attention to energy demand forecasting because it only needs a few data points to construct a time series model without statistical assumptions. Residual modiﬁcation is often considered as well to improve the accuracy of predictions. Several residual modiﬁcation models have been proposed, but they focused on residual sign estimation, whereas the FLNGM(1,1) model using functional-link net (FLN) can estimate the sign as well as the modiﬁcation range for each predicted residual. However, in the original FLN, an activation function with an inner product assumes that criteria are independent of each other, so additivity might inﬂuence the forecasting performance of FLNGM(1,1). Therefore, in this study, we employ the FLN with a fuzzy integral instead of an inner product to propose a nonadditive FLNGM(1,1). Experimental results based on real energy demand cases demonstrate that the proposed grey prediction model performs well compared with other grey residual modiﬁcation models that use sign estimation and the additive FLNGM(1,1).


Introduction
Several advanced countries have worked on the green new deal to promote the development of green energy technology, and to achieve the ultimate goal of sustainable development.Sustainable development is based on three main traits, including energy security, economic development, and environmental protection [1].These traits involve the amount of energy demand and supplies, effective use of the energy for industries, and development of new technologies for renewable energy.Environmental impacts due to energy consumption are inevitable because economic development is ongoing, and will have a significant role when devising the policy of sustainable development for cities or countries [2].In the last few decades, continuous economic development and population increases worldwide have led to rapid growth in the demand for energy.If the current global energy consumption pattern continues, the global energy consumption will increase by over 50% before 2030 [3].In terms of the global electricity demand, the average annual growth rate was 2.6% from 1990 to 2000 and 3.3% from 2000 to 2010.It is expected to reach 2.8% from 2010 to 2020 [4].Since the amount of energy demand must be taken into account first for sustainable development, this leads to an important issue related to how to accurately predict energy demand.
Many forecasting methods, including artificial intelligence techniques, multivariate regression, and time series analysis, have frequently been applied to energy demand forecasting [5][6][7][8][9][10][11][12][13].A large number of samples are required for multivariate regression and time series analysis like the autoregressive integrated moving average (ARIMA) [14].The performance of the above-mentioned methods can be significantly affected by the number and the representativeness of observations [15,16].However, using long-term data to build energy consumption prediction models may be impractical because the average annual growth rate of energy consumption is high and unstable.Beyond this, statistical methods usually require that the data conform to statistical assumptions, such as having a normal distribution [17], yet energy consumption data often do not conform to these usual statistical assumptions [17], limiting the forecasting capabilities of statistical methods.Therefore, to construct an energy demand prediction model, a forecasting method is needed that works well with small samples and without making any statistical assumptions [18].One of the grey prediction models, GM(1,1), has drawn our attention to energy demand forecasting [2].Indeed, how factors such as income and population influence the demand for energy is not clear, so energy demand forecasting can be regarded as a grey system problem [2,16].
The GM(1,1) model only needs four recent sample data points to achieve a reliable and acceptable accuracy of prediction [14].Several versions have been proposed to improve the accuracy of its predictions [19][20][21][22][23].In addition, a residual modification model built on residuals obtained from the original GM(1,1) may be an effective solution [24,25].In terms of residual modification, in addition to residual sign estimation [17,[26][27][28], Hu [29] proposed a new model called FLNGM(1,1) which uses the functional-link net (FLN) with an effective function approximation capability [30][31][32][33] to estimate both the sign and the modification range of the predicted residuals obtained from the residual GM(1,1) model.
In the original FLNGM(1,1), the hyperbolic tangent function is considered as the activation function.Clearly, the output of this function uses a weighted sum of the connection weights with an enhanced pattern.It is assumed that the additivity property of the interaction among individual features results in an enhanced pattern.However, the attributes are not always independent of each other [34][35][36][37], so an assumption of additivity for the enhanced pattern may not be reasonable [38], thereby affecting the forecasting performance of FLNGM(1,1).The Choquet fuzzy integral [39][40][41] does not assume the independence of each criterion, so in this study we propose a nonadditive residual modification model, called nonadditive FLNGM(1,1) (N-FLNGM(1,1)) for energy demand forecasting, where we replace the weighted sum with the Choquet integral inside the hyperbolic tangent function.A genetic algorithm(GA) [42] is used to construct the proposed nonadditive FLNGM(1,1) model with high prediction accuracy.
The remainder of this paper is organized as follows.Section 2 introduces the traditional GM(1,1) model using residual modification with sign estimation.Sections 3 and 4 present residual modification using FLN and the proposed N-FLNGM(1,1) model, respectively.Section 5 examines the energy demand forecasting performance of the N-FLNGM(1,1) model based on real energy demand cases in China.In Section 6, we discuss the outcomes and give our conclusions.

Original GM(1,1) Model
Given an original data sequence 1 by the accumulated generating operation [9,25] as follows: (1) 2 , . . ., x n can then be approximated by a first-order differential equation, where a and b are the developing coefficient and control variable, respectively.

The predicted value x(1)
k can be obtained by solving the differential equation with an initial condition that x (1) and thus x(1) k = x (0) 1 holds.Then, a and b can be estimated by a grey difference equation: where z (1) and α is usually specified as 0.5 for convenience.In turn, a and b can be obtained using the ordinary least-squares method: where and T Using the inverse accumulated generating operation, the predicted value of x Therefore, Note that x(1) 1 = x(0) 1 holds.

Residual Modification with Sign Estimation
To build a residual modification model, the original GM(1,1) is constructed first, followed by the residual GM(1,1).Let ε (0) = (ε n ) denote the sequence of absolute residual values, where Using the same construction for X (•) as the original GM(1,1) model, a residual model can be established for ε (0) , and the predicted residual of ε In the remnant GM(1,1) model with sign estimation, x(0) k is modified to x(0) k r by adding or subtracting ε(0) k from x(0) k [27]. x(0) where s k denotes the positive or negative sign of ε(0) k with respect to the k-th year.Those residual sign estimation methods mentioned above can be used to determine s k .

Residual Modification Using FLN
A single-layer feed-forward FLN is an appropriate tool for obtaining sign and range estimations due to its effective function approximation ability.An enhanced pattern with respect to a single input denoted by t t ∈ can be generated as (t, sin(πt), cos(πt), sin(2πt), cos(2πt)) through a functional link [31,32], where t denotes a specific time point, such as the t-th year.
In the original FLNGM(1,1), the hyperbolic tangent function is used as the activation function in the output node, where tanh(z) lies within the range (−1, 1).Let θ be the bias to the output node.When (t, sin(πt), cos(πt), sin(2πt), cos(2πt)) is presented to the net, z can be computed as follows: The actual output value, y, is just to equal tanh(z), and y can be interpreted as the range of modification for x(0) k .Let y k denote the actual output value with respect to x(0) k .Then, y k = 1 indicates that x(0) k can be modified as its tolerable upper limit, whereas y k = −1 denotes that x(0) k can be modified as its tolerable lower limit.An advantage compared with residual estimation methods is that the modification range of the original x(0) k is not restricted to ε(0) k .Thus, it can be seen that the additivity assumption of the interaction among t, sin(πt), cos(πt), sin(2πt), and cos(2πt) holds; i.e., these terms are assumed to be independent of each other.
In particular, based on the idea of three-sigma limits, which are used to set the upper and lower control limits in statistical quality control charts [43], the value x(0) k FLN predicted by the proposed model is formulated heuristically as follows. x(0) where 3ε k refers to data within three residuals with respect to x(0) k and represents the tolerable maximum range for modifying x(0) k .

Nonadditive Residual Modification Model
Let (t, sin(πt), cos(πt), sin(2πt), cos(2πt)) be represented by , where X is called the feature space.A fuzzy measure is a nonadditive set function that can be used along with a fuzzy integral for aggregating information sources [36].Among various fuzzy measures, the λ-fuzzy measure has been suggested for computation of the fuzzy integral because of its convenience users [44][45][46].The advantage is that, after determining fuzzy densities µ 1 , µ 2 , . . ., µ 5 , where µ k denotes µ({v k }), λ can be uniquely determined from the condition µ(X) = 1.µ(E k ) can be computed as follows: where As illustrated in Figure 1, fuzzy densities can be used as connection weights, and they can be determined automatically by the GA.
(1 ) where Ek = {vk, vk+1, …, v5} (1 ≤ k ≤ 5).As illustrated in Figure 1, fuzzy densities can be used as connection weights, and they can be determined automatically by the GA.Let f be a non-negative, real-valued measurable function defined on X.The element in X with min{f(vk)|k = 1, 2, …, 5} is renumbered as one, where f(vk) denotes the performance value of vk.In other words, all the elements vk are rearranged in order of descending f(vk), such that f(v1  over X of f with respect to μ is defined as follows. (c) where μ(E6) is specified as zero.The Choquet integral illustrated in Figure 2 comprises five different rectangular areas.Since the assumption of additivity among features is not warranted in real applications, it is reasonable to use the Choquet integral with respect to aggregate t, sin(πt), cos(πt), sin(2πt), and cos(2πt) to form the nonadditive hyperbolic tangent function denoted by (c)tanh(z).This definitely differentiates N-FLNGM(1,1) from the original FLNGM(1,1) models.
In the construction of the original FLNGM(1,1), a real-valued GA is designed to automatically determine the connection weights (i.e., μ1, μ2, μ3, μ4, μ5) and the bias (i.e., θ) for the N-FLNGM(1,1).Indeed, Rooij et al. [47] noted that GAs are appropriate tools for training perceptron-like networks.To construct the model with high prediction accuracy, the mean absolute percentage error (MAPE) for the training patterns is considered.MAPE can be treated as the benchmark, and it is more stable than the more commonly used mean absolute error and root mean square error [48,49].MAPE is formulated as follows: Let f be a non-negative, real-valued measurable function defined on X.The element in X with min{f (v k )|k = 1, 2, . . ., 5} is renumbered as one, where f (v k ) denotes the performance value of v k .In other words, all the elements v k are rearranged in order of descending f where µ(E 6 ) is specified as zero.The Choquet integral illustrated in Figure 2 comprises five different rectangular areas.Since the assumption of additivity among features is not warranted in real applications, it is reasonable to use the Choquet integral with respect to aggregate t, sin(πt), cos(πt), sin(2πt), and cos(2πt) to form the nonadditive hyperbolic tangent function denoted by (c)tanh(z).This definitely differentiates N-FLNGM(1,1) from the original FLNGM(1,1) models.
In the construction of the original FLNGM(1,1), a real-valued GA is designed to automatically determine the connection weights (i.e., µ 1 , µ 2 , µ 3 , µ 4 , µ 5 ) and the bias (i.e., θ) for the N-FLNGM(1,1).Indeed, Rooij et al. [47] noted that GAs are appropriate tools for training perceptron-like networks.To construct the model with high prediction accuracy, the mean absolute percentage error (MAPE) for the training patterns is considered.MAPE can be treated as the benchmark, and it is more stable than the more commonly used mean absolute error and root mean square error [48,49].MAPE is formulated as follows: where TS denotes the training or testing data.Ref. [50]  than the more commonly used mean absolute error and root mean square error [48,49].MAPE is formulated as follows: where TS denotes the training or testing data.[50] presented MAPE criteria for evaluating a forecasting model, where MAPE ≤ 10, 10 < MAPE ≤ 20, 20 < MAPE ≤ 50, and MAPE > 50 correspond to high, good, reasonable, and weak forecasting models, respectively.It is reasonable to define the fitness function as MAPE.Thus, the fitness of each individual is specified by a single objective in the combinatorial optimization problem.
Let nsize and nmax denote the population size and maximum number of generations, respectively, and Pm denote the population generated in generation m (1 ≤ m ≤ nmax).After evaluating the fitness value for each chromosome in Pm, selection, crossover, and mutation are applied until nsize new chromosomes have been generated for Pm+1.The GA can be performed until nmax generations have been generated.When the stopping condition has been satisfied, the algorithm is terminated, and the best chromosome with maximum fitness value among all successive generations serves as the desired solution to examine the generalization ability of the proposed N-FLNGM(1,1) model.The selection, crossover, and mutation operations required for N-FLNGM(1,1) are similar to those for FLNGM(1,1), so the details of the GA are omitted for simplicity and they can be found in [29].

Experimental Results
In Asia, China has become increasingly influential in terms of energy production and consumption [4].It is interesting to conduct experiments on real energy demand cases from China to compare the energy demand forecasting ability of the proposed N-FLNGM(1,1) model with that of the additive FLNGM(1,1) and other grey residual modification models that use sign estimation, including the residual modification model using multi-layer perceptron (MLPGM(1,1)) [26], genetic programming (GPGM(1,1)) [17], and Markov chain [27].The prediction accuracy of different prediction models were demonstrated by two real cases related to the energy demand in China.To ensure a fair comparison with the original FLNGM(1,1), the parameter specifications used for GA were the same as those described by Hu (2016), including nsize = 200, nmax = 1000, probabilities for crossover and mutation are 0.7 and 0.01 respectively.Let n size and n max denote the population size and maximum number of generations, respectively, and P m denote the population generated in generation m (1 ≤ m ≤ n max ).After evaluating the fitness value for each chromosome in P m , selection, crossover, and mutation are applied until n size new chromosomes have been generated for P m+1 .The GA can be performed until n max generations have been generated.When the stopping condition has been satisfied, the algorithm is terminated, and the best chromosome with maximum fitness value among all successive generations serves as the desired solution to examine the generalization ability of the proposed N-FLNGM(1,1) model.The selection, crossover, and mutation operations required for N-FLNGM(1,1) are similar to those for FLNGM(1,1), so the details of the GA are omitted for simplicity and they can be found in [29].

Experimental Results
In Asia, China has become increasingly influential in terms of energy production and consumption [4].It is interesting to conduct experiments on real energy demand cases from China to compare the energy demand forecasting ability of the proposed N-FLNGM(1,1) model with that of the additive FLNGM(1,1) and other grey residual modification models that use sign estimation, including the residual modification model using multi-layer perceptron (MLPGM(1,1)) [26], genetic programming (GPGM(1,1)) [17], and Markov chain [27].The prediction accuracy of different prediction models were demonstrated by two real cases related to the energy demand in China.To ensure a fair comparison with the original FLNGM(1,1), the parameter specifications used for GA were the same as those described by Hu (2016), including n size = 200, n max = 1000, probabilities for crossover and mutation are 0.7 and 0.01 respectively.

Case I
An experiment was conducted based on the historical annual energy demand data from China collected between 1990 and 2007.As reported by Lee and Tong [17], the data from 1990 to 2003 were used for model fitting, and those from 2004 to 2007 were used for ex-post testing.The forecasting results obtained by Lee and Tong [17] using the original GM(1,1), MLPGM(1,1), and GPGM(1,1) models are summarized in Table 1 and illustrated in Figure 2.
Table 1 and Figure 3 show the prediction performance of the different forecasting models.The MAPE for model fitting obtained by N-FLNGM(1,1) was slightly inferior to that by the original FLNGM(1,1).However, we can see that N-FLNGM(1,1) outperformed the other forecasting models compared with the testing data.Using the testing data, in terms of MAPE, both the original FLNGM(1,1) and the proposed N-FLNGM(1,1) had good forecasting abilities, whereas the original GM(1,1), GPGM(1,1), MLPGM(1,1), and Markov-chain sign estimation only had reasonably fair forecasting abilities.A massive change occurred up to 2004, which may explain why the ex-post testing results were not as good as the model-fitting results.

Case II
The second experiment was conducted on the historical annual electricity demand of China, collected from China Statistical Yearbook 2014 [51].As described by Zhou et al. [52], data from 1981 to 1998 were used for model fitting, and the other data for testing.The forecasting results obtained by the different forecasting models are shown in Table 2 and Figure 4. We can see that the MAPE for the original GM(1,1), MLPGM(1,1), GPGM(1,1), Markov-chain sign estimation, original FLNGM(1,1), and the proposed N-FLNGM(1,1) using the training data were 2.28%, 2.03%, 1.44%, 2.31%, 0.10%, and 0.13%, respectively.Using the testing data, the MAPE were 7.24%, 3.90%, 3.90%, 3.90%, 1.52%, and 1.41%, respectively.Figure 5 illustrates the APE of the different forecasting models.Based on these results, it is obvious that the proposed N-FLNGM(1,1) model yielded comparable performance to the other forecasting models considered.Additionally, Zhou et al. [52] showed that the MAPE for an autoregressive integrated moving average model and trigonometric grey prediction model were 3.25% and 2.12%, respectively, for training data, and 2.64% and 2.37% for testing data, which are inferior to the results obtained by the proposed N-FLNGM(1,1) model.

Discussion and Conclusions
The GM(1,1) model is the most frequently used grey prediction model, and it has played an important role in energy demand forecasting because it only requires a limited number of samples to construct a prediction model without statistical assumptions.In this paper, the proposed N-FLNGM(1,1) is built on the original FLNGM(1,1) model, which uses FLN to estimate the sign and modification range for each predicted residual simultaneously by using the predicted value obtained from the GM(1,1) model.We performed experiments to validate the effectiveness of the proposed N-FLNGM(1,1) model for energy demand forecasting.Experimental results demonstrate that the proposed nonadditive grey prediction model performs well compared with the additive FLNGM(1,1).In addition to grey prediction models, we further examine the prediction performance of a frequently-used prediction model-multi-layer perceptron (MLP) with backpropagation learning.MLP here has one input node, one hidden layer with two neurons, and one output layer with one neuron, training with 10,000 iterations, and learning rate of 0.8.The forecasting results obtained by MLP for case I and II are summarized in Tables 3 and 4, respectively.It can be seen that the proposed N-FLNGM(1,1) outperforms MLP.1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 Original GM(1,1) MLPGM(1,1) GPGM(1,1) Markov-chain FLNGM(1,1) N-FLNGM(1,1)

Discussion and Conclusions
The GM(1,1) model is the most frequently used grey prediction model, and it has played an important role in energy demand forecasting because it only requires a limited number of samples to construct a prediction model without statistical assumptions.In this paper, the proposed N-FLNGM(1,1) is built on the original FLNGM(1,1) model, which uses FLN to estimate the sign and modification range for each predicted residual simultaneously by using the predicted value obtained from the GM(1,1) model.We performed experiments to validate the effectiveness of the proposed N-FLNGM(1,1) model for energy demand forecasting.Experimental results demonstrate that the proposed nonadditive grey prediction model performs well compared with the additive FLNGM(1,1).In addition to grey prediction models, we further examine the prediction performance of a frequently-used prediction model-multi-layer perceptron (MLP) with backpropagation learning.MLP here has one input node, one hidden layer with two neurons, and one output layer with one neuron, training with 10,000 iterations, and learning rate of 0.8.The forecasting results obtained by MLP for case I and II are summarized in Tables 3 and 4, respectively.It can be seen that the proposed N-FLNGM(1,1) outperforms MLP.1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 Original GM(1,1) MLPGM(1,1) GPGM(1,1) Markov-chain FLNGM(1,1) N-FLNGM(1,1)

Discussion and Conclusions
The GM(1,1) model is the most frequently used grey prediction model, and it has played an important role in energy demand forecasting because it only requires a limited number of samples to construct a prediction model without statistical assumptions.In this paper, the proposed N-FLNGM(1,1) is built on the original FLNGM(1,1) model, which uses FLN to estimate the sign and modification range for each predicted residual simultaneously by using the predicted value obtained from the GM(1,1) model.We performed experiments to validate the effectiveness of the proposed N-FLNGM(1,1) model for energy demand forecasting.Experimental results demonstrate that the proposed nonadditive grey prediction model performs well compared with the additive FLNGM(1,1).In addition to grey prediction models, we further examine the prediction performance of a frequently-used prediction model-multi-layer perceptron (MLP) with backpropagation learning.MLP here has one input node, one hidden layer with two neurons, and one output layer with one neuron, training with 10,000 iterations, and learning rate of 0.8.The forecasting results obtained by MLP for case I and II are summarized in Tables 3 and 4, respectively.It can be seen that the proposed N-FLNGM(1,1) outperforms MLP.features in the enhanced pattern, the testing results obtained by the additive FLNGM(1,1) can be further improved by the proposed N-FLNGM(1,1).Second, MLPGM(1,1), GPGM(1,1), and Markov-chain sign estimation contributed to estimate residual sign s k in Equation ( 13) to improve the prediction accuracy of the original GM(1,1) model.The common characteristic of these three models is that x(0) k can be adjusted by either ε(0) k or −ε k .This is too tight a restriction for the modification range.In contrast, originating from the idea of three-sigma limits, N-FLNGM(1,1) presents a novel updated rule for x(0) k by FLN, where the adjustment of x(0) k ranges between −3ε k and 3ε (0) k .It can be seen that N-FLNGM(1,1) outperformed MLPGM(1,1), GPGM(1,1), and Markov-chain sign estimation compared with data used for model fitting and ex-post testing.It should be noted that in a perceptron-like net, it is not easy to explain the meaning of the connection weights [53].However, an advantage of the proposed N-FLNGM(1,1) compared with the original FLNGM(1,1) is that µ k can be interpreted as the degree of importance of v k .
Over the next two decades, coal, natural gas, and crude oil can be the main energy supplies driving the world economy.It is expected that the growth of the crude oil demand will mainly come from emerging markets from 2015 to 2035, and one half of the growth will be in China [54].As a matter of fact, energy consumption in China has been mainly provided by coal and crude oil.For instance, the China Statistical Yearbook 2014 [51] showed that approximately two-thirds of the total energy consumed was provided by coal and 18% by oil in 2013.This leads to inevitable environmental impacts on China.Undoubtedly, energy demand prediction has become increasingly important when devising sustainable development plans for China [16].The proposed N-FLNGM(1,1) has demonstrated its potential for energy demand forecasting.
When it comes to fuzzy integrals, the Sugeno integral is the other most commonly used fuzzy integral.However, only the maximum and minimum operators are involved in the Sugeno integral, so the Choquet integral is preferable to the Sugeno integral for many decision problems [36], which is why we use the Choquet integral rather than the Sugeno integral.In order to perform a soft aggregation, two special ordered weighted averaging (OWA) operators-S-OWA-OR and S-OWA-AND-can be employed to replace the maximum and minimum operators, respectively [55].In future research, it would be interesting to examine the forecasting ability of a nonadditive prediction model using the Sugeno integral combined with OWA operators.
In this study, both the original and the residual GM(1,1) models use the least squares method to obtain the developing coefficient and control variable, which depend on the background value.However, it is not easy to determine the background value.Hu et al. [56] presented a novel neural-network-based GM(1,1) model (NNGM(1,1)) to resolve this troublesome problem by automatically determining the developing coefficient and control variable.Thus, it would be interesting to examine whether incorporating NNGM(1,1) into N-FLNGM(1,1) instead of the traditional GM(1,1) model might affect the prediction performance of N-FLNGM(1,1) in energy demand forecasting.

Figure 1 .
Figure 1.A functional-link net with Choquet fuzzy integral.

Figure 1 .
Figure 1.A functional-link net with Choquet fuzzy integral.

Figure 3 .
Figure 3. Predicted and actual values of different forecasting models for Case I.

Figure 3 .
Figure 3. Predicted and actual values of different forecasting models for Case I.

Figure 4 .
Figure 4. Predicted and actual values of different forecasting models for Case II.

Figure 5 .
Figure 5. APE of different forecasting models for Case II.

Figure 4 .
Figure 4. Predicted and actual values of different forecasting models for Case II.

Figure 4 .
Figure 4. Predicted and actual values of different forecasting models for Case II.

Figure 5 .
Figure 5. APE of different forecasting models for Case II.

Figure 5 .
Figure 5. APE of different forecasting models for Case II.
presented MAPE criteria for evaluating a forecasting model, where MAPE ≤ 10, 10 < MAPE ≤ 20, 20 < MAPE ≤ 50, and MAPE > 50 correspond to high, good, reasonable, and weak forecasting models, respectively.It is reasonable to define the fitness function as MAPE.Thus, the fitness of each individual is specified by a single objective in the combinatorial optimization problem.

Table 1 .
Prediction accuracy obtained by different forecasting models for energy demand (unit: 10 4 tons of standard coal equivalent (SCE)).MAPE: mean absolute percentage error.

Table 2 .
Prediction accuracy obtained by different forecasting models for electricity demand (unit: 100 million kWh).

Table 3 .
Prediction accuracy obtained by multi-layer perceptron (MLP) and autoregressive integrated moving average (ARIMA) for case I.