A Novel Adaptive Intelligent Ensemble Model for Forecasting Primary Energy Demand

: E ﬀ ectively forecasting energy demand and energy structure helps energy planning departments formulate energy development plans and react to the opportunities and challenges in changing energy demands. In view of the fact that the rolling grey model (RGM) can weaken the randomness of small samples and better present their characteristics, as well as support vector regression (SVR) having good generalization, we propose an ensemble model based on RGM and SVR. Then, the inertia weight of particle swarm optimization (PSO) is adjusted to improve the global search ability of PSO, and the improved PSO algorithm (APSO) is used to assign the adaptive weight to the ensemble model. Finally, in order to solve the problem of accurately predicting the time-series of primary energy consumption, an adaptive inertial weight ensemble model (APSO-RGM-SVR) based on RGM and SVR is constructed. The proposed model can show higher prediction accuracy and better generalization in theory. Experimental results also revealed outperformance of APSO-RGM-SVR compared to single models and unoptimized ensemble models by about 85% and 32%, respectively. In addition, this paper used this new model to forecast China’s primary energy demand and energy structure.


Introduction
Ensuring adequate energy supply is an important material basis for economic construction and development in most countries of the world. Especially for developing countries which are in the process of transformation and economic restructuring, energy supply and demand and structural adjustment are the keys to their transformation [1]. Estimating energy demand will not only affect energy market allocation [2], energy policy adjustment [3], economic development [4], carbon emission prediction [5], and other related energy planning research, it can also serve as the basis for building energy planning models [6]. Therefore, accurately forecasting energy demand can provide effective advice for energy policy-makers on energy development planning and policy formulation. It also helps to avoid imbalances in energy supply and demand and ensures energy security in the region and worldwide. In addition, adjustment and transformation of the energy structure directly affect environmental protection and governance in the region and worldwide [7]. In view of this, accurately forecasting primary energy demand and energy structure is important for energy planning.
Many methods have been presented in the field of prediction. In short, energy forecasting models fall into three categories: time-series models, soft-computing technologies, and the ensemble of Network (RNN) [34,35], and other prediction methods based on deep learning [36,37], which are used to forecast photovoltaic power generation, energy demand, and power load [36,[38][39][40][41]. Liu et al. (2016) established the Gated Recurrent Unit prediction model (GRU) and forecasted China's primary energy demand [40]. Lee and Kim (2019) used ANN, DNN, and LSTM to forecast photovoltaic power combined with meteorological information [41]. However, ANN and time-series prediction models based on deep learning have better performance in the training of large samples. Compared with ANN and deep learning, SVR has a good generalization ability and low-computational complexity.
The SVR model improves the generalization of learning machines by seeking structural risk minimization, so as to obtain good statistical rules in the case of fewer statistical samples [18]. This method has been widely used in financial market forecasting [42], weather forecasting [43], energy consumption and power load forecasting [44,45], etc. Jain et al. (2014) used SVR to construct sensor-based forecasting models to forecast energy consumption in New York buildings [44]. Chen et al. (2017) constructed SVR to forecast the power load based on the demand response [45]. The SVR model structure is simple, and the prediction accuracy is high. However, the model has better performance for short-term prediction. At present, more scholars choose to optimize SVR or combine it with other models to improve its generalization [46,47]. The applicability, advantages, and disadvantages of some single prediction models are listed in Table 1. Table 1. Summary of several models commonly used for energy prediction.

Models
Feature Advantages Disadvantages Applied to

Regression analysis
Establishing the regression equation through the correlation between variables.
The correlation degree between the factors can be analyzed.
Poor generalization and low accuracy.
Finding the relationship between energy consumption and economic growth [4]; forecasting long-term electricity consumption [20].

ARIMA
Treating the data sequence formed by the prediction object over time as a random sequence, and then establishing a model to express the sequence.
The model is simple and only requires endogenous variables.
Requirements are stable time-series data or are stable after differentiation; essentially only captures linear relationships, not non-linear relationships.
Forecasting the next-day electricity price in the Spanish mainland [21]; forecasting monthly electricity consumption in Eastern Saudi Arabia [22].

Grey
Using small sample data to establish a grey differential equation and describing the development trend of things over a long time.
Less sample size required; short-term prediction accuracy is higher; less model parameters.
Cannot consider the relationship between factors; medium-and long-term prediction error is relatively large.
Forecasting annual power load in Shanghai, China [24]; forecasting growth trends in renewable energy [25].
ANN A highly complex non-linear dynamic learning system; suitable for handling inaccurate and fuzzy information processing problems under various factors and conditions.
Can fully approximate complex non-linear relationships; parallel distributed processing; highly robust and fault tolerant.
More model parameters; learning time is too long; large sample size required for model training.
LSTM Long-term save input; identifying useful information.
Suitable for dealing with problems highly related to time-series; solves long sequence dependency problems.
The training time of the model is a little long.

SVR
Application of Support Vector in Regression Function; maps data to a high-dimensional feature space through a non-linear mapping and performs regression in this space.
Small samples; simplifies regression problems; high flexibility.
With the increase of sample size, the time complexity of model training will increase.
Forecasting buildings' energy consumption in New York [44]; forecasting the electrical load of four typical office buildings [45].

Optimization Algorithm
The intelligent optimization algorithm is an optimization algorithm with global optimization performance and strong versatility that is suitable for parallel computing [48]. In 1995, Kennedy and Eberhart [49] proposed the PSO algorithm, which represents significant progress in the research of the meta-heuristic optimization algorithm. The PSO algorithm starts from a random solution and finds the optimal solution through iteration. This algorithm has fewer parameters that need to be adjusted and the structure is simple. Thus, it is widely used in parameter optimization, engineering application, and stochastic optimization [50]. However, PSO also has the potential to fall into a local optimum. With the continuous evolution of optimization algorithms and the increasing complexity of optimization problems, more and more scholars are choosing to improve PSO for better application [51,52].
In recent years, some new swarm intelligence optimization algorithms have been proposed, including gray wolf optimization (GWO), moth-flame optimization (MFO), the whale optimization algorithm (WOA), etc. The GWO algorithm is an optimization method that imitates the hunting behavior of gray wolves [53]. The GWO algorithm has the characteristics of simple structure and good global performance, but its local search ability is poor [54]. The MFO algorithm is an optimization algorithm for simulating the flight characteristics of moths, which was proposed by Mirjalili in 2015 [55]. Although MFO has fast convergence speed and high accuracy, this method also has the problem of being easily trapped in local extreme points [56]. In 2016, the WOA was proposed by Mirjalili [57], and it is often used to solve large-scale, complex optimization problems [58]. Easily falling into local extremum is the common characteristic of most optimization models, and this represents a challenge and s goal for scholars trying to improve optimization models.

Ensemble-Based Methods
Energy management is affected by energy policy, population, consumption, and economic development, and the change in energy consumption is unstable. The single forecasting model can only extract some features of energy consumption time-series, and it is difficult to obtain a more perfect prediction effect. In later years, many scholars have chosen to adopt an ensemble model approach. More specifically, multiple single-prediction models have been established, and then these single models are combined. This combination can extract the advantages of each method and achieve more accurate prediction results [15,59,60].  forecasted power demand based on a hybrid approach [59]. Xiao et al. (2018) combined with the grouping method of data processing selection set to construct an ensemble prediction model, which is mainly used to forecast the non-linear changes in energy consumption [60].
The ensemble model extracts the advantages of each individual model and has a higher prediction accuracy. Therefore, for constructing an adaptive ensemble model with stronger generalization ability and higher forecasting accuracy, determining the combined weight of every single model is the focus of the ensemble model [61,62]. The linear combination of predictive models can improve the predictive power of a single model, but the flexibility in assigning weights is relatively poor, thus affecting the adaptability of the ensemble model [63,64]. Liu et al. (2014) adjusted the weight of the ensemble model by PSO and used the optimized model to forecast carbon emissions [63].  successfully developed an ensemble prediction model for short-term wind speed prediction using the variable weighted combination theory [64]. Thus, while improving the prediction accuracy of the model, the adaptability and flexibility of the enhanced model are also issues that deserve attention from scholars [65].

Methodology
The GM (1,1) [16], RGM [17], and ARIMA [12] in a time-series model have better performance in forecasting small-sized samples [8,9]. In addition, SVR in an AI method has good generalization ability [18]. In view of the small sample size of primary energy annual data, we combined ARIMA, GM (1,1), and RGM with SVR to construct three ensemble forecasting models based on standard deviation. Then, the global search ability of PSO was enhanced by improving the inertial weight, and the APSO algorithm was used to optimize the ensemble model with higher accuracy. Thus, an adaptive ensemble forecasting model with high accuracy and good generalization was constructed. The overall framework of this study is shown in Figure 1. In this section, we introduce the modeling principles and methods. We first introduce the modeling principles and processes of the four models: GM (1,1), RGM, ARIMA, and SVR in detail.

GM (1,1)
The basic idea of GM is that the original sequence (0) x is composed of the original data, and the sequence (1) x is generated by the method of cumulative generation [19]. GM can weaken the randomness of the original data and make them show more obvious characteristics. GM is a differential equation model that is used to generate a transformed sequence (1) x . GM (1,1) represents the first-order differential equation model with one variable. Specific steps are as follows: Step 1. Assume (0) x is the original sequence: where n is the number of years observed. Perform an accumulation of the original sequence (0) x , recorded as the first-order accumulated generating sequence (1-AGO), and generate a sequence: (1), (2), , ( ) where, Step 2. Assume (1) x is continuously differentiable and satisfies the first-order linear differential equation: (1) (1) where a and u are pending parameters.
Step 4. Solve for the parameters a and u in Equation (4) using the least squares method:  The basic idea of GM is that the original sequence x (0) is composed of the original data, and the sequence x (1) is generated by the method of cumulative generation [19]. GM can weaken the randomness of the original data and make them show more obvious characteristics. GM is a differential equation model that is used to generate a transformed sequence x (1) .
GM (1,1) represents the first-order differential equation model with one variable. Specific steps are as follows: Step 1. Assume x (0) is the original sequence: where n is the number of years observed. Perform an accumulation of the original sequence x (0) , recorded as the first-order accumulated generating sequence (1-AGO), and generate a sequence: where, Step 2. Assume x (1) is continuously differentiable and satisfies the first-order linear differential equation: where a and u are pending parameters.
Step 3. Discretize Equation (4), and the differential equation becomes a difference equation of the form where λ= 0.5.
Step 4. Solve for the parameters a and u in Equation (4) using the least squares method: Energies 2019, 12, 1347 Step 5. Establish a prediction formula. Solve Equation (8) after finding a and u: Perform a subtractive generation on Equation (9) to get the prediction formula:

RGM
In theory, GM (1,1) is a continuous-time function that can continue from the initial value to any moment in the future. As time goes by, some disturbances in the future may affect the system. The farther away the future is, the larger the grey interval of the predicted value. Therefore, for GM (1,1), the meaningful prediction data are only a few data points after x (0) (n), and other data can only represent the planning data under the premise that the current conditions are stable. For improving the precision of the predicted data, new information has to be constantly added while making full use of the known information [17].
Therefore, an improved prediction method is proposed: GM (1,1) is established based on the known series, a grey value is predicted, and then the predicted value is added to the known series to form a sequence of information. GM (1,1) is established based on the known series, a grey value is predicted once more, and then the predicted value is added to the known series again to form a new sequence of information. Whenever a new datum is added, GM (1,1) is created. Because the previous data are more and more unable to reflect the new situation, each time a new datum is added, the oldest datum is eliminated. Therefore, the dimensions of the sequence are unchanged, and then GM (1,1) is constructed to complete the prediction target. For each step of prediction, the parameters are corrected once, the model is improved, and the predicted values are generated dynamically. This prediction method is called the "grey model with the rolling mechanism" (RGM). Specific steps are as follows: Step 1. Construct GM (1, 1) by using the original sequence x (0) 0 = (x (0) (1), x (0) (0), · · · , x (0) (n)).
Step 2. Forecast a new datum as x(n + 1), add it to the original data x 0 (0) , and remove the oldest Step 4. Forecast the next value, denoted as x 2 (n + 1). Add it to the original datum x (0) 1 and remove one of the oldest raw datum x (0) (2) to form a new sequence: Remove the old and add the new, then add them one by one, until the prediction target is achieved, or a certain accuracy requirement is met. At each prediction step, the parameters are modified, and the model is improved constantly.

ARIMA
The digital characteristics of a non-stationary sequence, such as the mean, variance, and covariance, change with time. In 1970, Box and Jenkins [19] proposed a time-series analysis method based on stochastic theory. The basic models include three types [23]: the autoregressive model (AR (p)), the moving average model (MA (q)), and the ARIMA. The ARIMA carries out the disturbance analysis by using AR, single integer, and MA. It describes the change in the time-series by the extrapolation mechanism, and comprehensively takes into account the previous data, current fata, and error of the prediction variable, to improve the prediction accuracy.
For a single sequence Y t , the non-stationary sequence can be translated into a stationary sequence X t using the d-order difference so that the stationary sequence X t is fitted to autoregressive moving average model (ARMA (p, q)).
The ARMA (p, q) consists of AR (p) and MA (q). The AR (p) is an autoregressive model with order p.
where ϕ 1 , · · · , ϕ p are parameters, c is a constant, and the random variable u t is a white-noise sequence. MA (q) is a moving average model of order q with the expression: where θ 1 , · · · , θ q are parameters, µ is the expected value of X t (usually assumed to be 0), ε t , ε t−1 , · · · , ε t−q are the error terms of the white-noise sequence. The expression of the ARMA (p, q) model is: The ARMA (p, q) is changed into ARIMA (p, d, q) by d-order differential transformation. In ARIMA (p, d, q), the parameter p is the order of AR (p), d is the difference, and q is the order of MA (q).
The construction of ARIMA (p, d, q) and the determination of the parameters are summarized as follows: (i) First of all, it should be examined whether the time-series is stable. A smooth time-series is the basic precondition for constructing this model. For unstable time-series, the d-order difference should be made before the model is constructed to make it stationary. (ii) Then the main parameters of the model need to be determined. The d-value can be obtained by the difference, and then the q-value and the p-value can be obtained by the autocorrelation function and the partial correlation function. (iii) After the model parameters are determined, the saliency of the model is tested to determine the rationality of each parameter and to verify whether the model's settings conform to the specifications. (iv) The model is trained with the sample, the sample data are fitted, and ARIMA (p, d, q) is found.

SVR
The basic principle of SVR is to map the input factor x to the high-dimensional feature space through the non-linear mapping ϕ, and then linear regression is performed in this high-dimensional space to find the function f (x) [44]. The modeling steps are as follows.
Step 1. Given a set of numbers , with x i the input factor, y i the expected value, and n is the amount of data, then the expression is found as: where, ω is the weight and b is the bias.
Step 2. By means of the principle of structural risk minimization, the parameters ω and b are estimated: where w 2 is a confidence risk, C is a penalty parameter, ε is an insensitive coefficient, and ξ * i and ξ i are slack variables.
Step 3. For convenience, Equation (15) is transformed into a dual problem, and SVR function is obtained: where α i and α * i are support vector parameters, and K(X t , X) is an inner product function.
Step 4. Since the radial basis function (RBF) is a good general kernel and can achieve non-linear projection [66], according to the Mercer condition, the kernel function is defined, and RBF is selected: By substituting Equation (17) into Equation (16), we obtain the following equivalent Equation (18): where σ is a hyper-parameter of the RBF kernel that defines the feature-length scale of the similarity between learning samples, and x is the center of the kernel function. Equation (18) is operated to obtain parameters α i and b, thereby obtaining an energy demand prediction model.

Model Evaluation Criteria
Three criteria are used to evaluate the performance of a forecasting algorithm: Mean Absolute Error (MAE), Mean Absolute Percent Error (MAPE), and Mean Square Percent Error (MPSE) which are defined as where, e i represents the error values, x i represents the sample data,x i represents the prediction data, and n is the sample size. The accuracy examination of each forecasting algorithm is determined using the appropriate tests.

Construction of Ensemble Forecasting Model
In the process of constructing the ensemble forecasting model, the most important thing is to extract the effective forecasting results of the single forecasting model, that is, to determine the weight of each single forecasting model. In this paper, GM (1,1), RGM, and ARIMA are combined with SVR, respectively, and the weights of the ensemble prediction model are determined based on the standard deviation method to forecast China's primary energy demand. Then, we propose adding the momentum factor into PSO to improve the global search ability of PSO, and compare with PSO, GWO, MFO, and WOA to test the optimization accuracy of the improved PSO (APSO). Finally, the cost function is constructed by the sum of the absolute values of the prediction errors, and RGM-SVR is optimized by the improved PSO to achieve the best combination; thus, the adaptive weighted ensemble prediction model (APSO-RGM-SVR) is constructed to forecast China's primary energy demand.

Construction of the Ensemble Model Based on the Standard Deviation Weight Method
In this section, we describe three ensemble models: GM (1,1)-SVR, RGM-SVR, and ARIMA-SVR. The relevant parameters of each single model are estimated by model training and calculation. Then we calculate the weights of each model based on the standard deviation method to combine the single models based on the calculated weights.

Standard Deviation Method
The standard deviation weight method is a relatively common weight calculation method. The calculation formula is as follows: Assume the standard deviations of the prediction methods are s 1 and s 2 , and where, s 1 and s 2 are the standard deviations of the prediction error of each model. Then, the weights are

GM (1,1)-SVR
The modeling procedure for the GM (1,1)-SVR combined prediction model is shown in Figure 2. The modeling steps are as follows: Step 1. GM (1,1) and SVR are constructed, respectively. According to Step 4 of Section 3.1., the parameters a and u of GM (1,1) are calculated as shown in Table 2. The Lagrangian multiplier method is used to transform SVR into a "dual problem", and then the parameters are solved using sequential minimal optimization (SMO). The parameters of SVR have been tested repeatedly. It is shown that the fitting result is optimal when C = 10000 and σ 2 = 0.01. The parameters α i and b of the SVR are estimated, as shown in Table 3.
Step 2. The results of GM (1,1) and SVR are assigned weights according to the standard deviation method, and the weight distribution is as shown in Table 4.

RGM-SVR
The description of the RGM-SVR combined prediction model is shown in Figure 3. The steps are as follows: Step 1. RGM and SVR are constructed, respectively. Take China's primary energy consumption from 2005 to 2012 as a training set, which is brought into the model for parameter estimation. The training samples are unchanged, therefore, the estimated parameters of the SVR model are unchanged. The parameter estimation results of the RGM model are shown in Table 5.
The parameters of SVR have been tested repeatedly. C = 10000, σ 2 = 0.01. The parameters α i and b of the SVR are estimated as shown in Table 3.
Step 2. The prediction results of RGM and SVR are assigned weights according to the standard deviation method. The weight distribution is shown in Table 6. unchanged. The parameter estimation results of the RGM model are shown in Table 5. The parameters of SVR have been tested repeatedly. C = 10000, 2 σ = 0.01. The parameters i α and b of the SVR are estimated as shown in Table 3.
Step 2. The prediction results of RGM and SVR are assigned weights according to the standard deviation method. The weight distribution is shown in Table 6.
Step 3. The accuracy of RGM-SVR is evaluated by MAE, MAPE, and MSPE.    The description of the ARIMA-SVR model is shown in Figure 4. The steps are as follows.
Step 1. Establish an ARIMA (p, d, q) model for China's primary energy demand forecast and examine whether the input samples are stationary time-series. Step 2. Perform a time-series difference d on the sample. Begin with the first-order difference and observe the sequence. After the d-order difference, the sequence tends to be stationary, and the difference order is the parameter d of ARIMA.
Step 3. Estimate the parameters p and q. Check and observe the autocorrelation and partial correlation graphs of the stationary time series to obtain q and p, and so that the parameters of ARIMA are determined which are shown in Table 7.
Step 4. Assign weights to ARIMA and SVR by the standard deviation method, and the weight distribution is as shown in Table 8.
Step 5. The accuracy of ARIMA-SVR is evaluated by MAE, MAPE, and MSPE.

Construction of the Adaptive Weight Ensemble Forecasting Model
Linear and non-linear combinations are the most commonly used methods for ensemble forecasting models. However, when the prediction accuracy of the predicted system is unstable, the fixed-weight combination cannot be adjusted according to the changes in the predicted system in time, and thus, the generalization is poor. The adaptive variable weight combination can adjust the weight of each individual prediction model in time, thus improving the overall prediction accuracy and generalization of the ensemble model. Therefore, this section introduces the method of improving the inertia weight of PSO (APSO). Then, APSO is compared with PSO, GWO, MFO, and WOA to test the performance of the optimization algorithms. Finally, APSO is used to optimize the ensemble model to construct the adaptive weight ensemble prediction model (APSO-RGM-SVR).

PSO
PSO is a new evolutionary algorithm with easy implementation, high precision, and fast convergence. Therefore, PSO is suitable for solving practical problems.
The algorithm principle is as follows. PSO abstracts the potential solution of the optimization problem into a particle in the D-dimensional search space. Each particle has a fitness value set according to the objective function, and the particle adjusts its position s according to the flight speed v. Assume the search space has P particles composing a population s = (s 1 , s 2 , · · · , s i , · · · , s P ), where the position of the ith particle (1 ≤ i ≤ P) is a D-dimensional vector. Then the position change rate of particle i at time t is v t i = (v t i1 , v t i2 , · · · , v t iD ), and the current optimal position of particle i is represented as p t i = (p t i1 , p t i2 , · · · , p t iD ), which is often denoted by p best . The optimal position of the population from the particle swarm search to time t is denoted by p t g = (p t g1 , p t g2 , · · · , p t gD ), or by g best . The position of the particle is updated by the following formula: where ω is the inertia weight, d = 1, 2, . . . , D is the particle dimension, t is the current iteration number, v id ∈ [v min , v max ] is the particle velocity, the acceleration factors c 1 and c 2 are non-negative constants, and r 1 and r 2 are random numbers distributed in the interval (0, 1).

APSO
The main factor affecting the search capability of particle swarm is the inertia weight. For improving the adaptive ability of PSO, an adaptive adjustment method based on inertia weight is proposed, which is called APSO. The algorithm is briefly described as follows: Define the particle population diversity function where f min (α(t)) = Min( f (α i (t))) f max (α(t)) = Max( f (α i (t))) where f (a i (t)) is the fitness value of the ith particle (i = 1, 2, . . . , P,), and f min (α(t)) and f max (α(t)) are the minimum and maximum values of the particle fitness at time t, respectively. The diversity function F d (t) can be used to describe the motion characteristics of the particle, and the non-linear function δ(t) is defined accordingly for the adaptive adjustment of the inertia weight: where L ≥ 2 is the initialization constant. The adjustment rules for defining particle inertia weight adaptation are v t+1 where v t i is the velocity of the particle at time t, β is the "momentum" hyper-parameter, that is usually set to 0.9, η is the learning rate, dω t i is the gradient of the weight, ξ > 0 is a custom slack variable used to adjust the global search ability of the ion, and ε t i is the distance between the particle i and the global optimal particle, defined as where s t i and g b are the positions of particle i and the global optimal particle at time t, respectively.

Contrast Test of Optimization Algorithms
The PSO algorithm has fewer parameters that need to be adjusted and the structure is simple. However, PSO also has the potential to fall into local optimum. In recent years, some new swarm intelligence optimization algorithms have been proposed, including GWO, MFO, and WOA. These algorithms have high accuracy, but the problem of falling into local extremum is the common characteristic of most optimization models. Therefore, we improved PSO, which has fewer parameters and a simpler structure, to improve the global search ability of the optimization algorithm. In order to test the performance of the improved optimization algorithm, we conducted a comparative test.
The optimization performance of APSO was tested in a 30-dimensional search space. The population size was set to 30, the maximum number of iterations was 1000, and the benchmark functions are F 1 -F 6 . The standard deviation comparison results after 40 independent experiments of GWO, MFO, WOA, POS, and APSO are shown in Table 9. It can be seen from Table 10 that for six benchmark functions, the optimization accuracy of APSO for most functions was many orders of magnitude higher than that of PSO, and the accuracy of APSO was better than that of traditional GWO, MFO, and WOA. In the process of solving continuous single-mode functions, APSO had the highest accuracy for solving function F 1 . For the optimization of function F 3 , APSO was slightly better than other algorithms. For the multimodal functions F 4 , F 5 , and F 6 with multiple local extremums, the optimization accuracy of APSO was significantly better than other algorithms.

Construction of APSO-RGM-SVR Based on Adaptive Inertia Weight
Based on the advantages of various forecasting models, considering the characteristics of the small sample size and long forecasting target of our paper, we combined RGM with SVR. Then, the combined weights of the two prediction models were determined by APSO. Finally, an adaptive optimal weight ensemble prediction model was established, which is shown in Figure 5. The involved steps are as follows.
Step 1. Based on SVR and RGM, a model for China's primary energy demand forecasting is constructed and brought into the sample training set. The parameters of SVR and RGM required to determine the model are shown in Tables 3 and 5.
Step 2. The sum of the absolute values of the prediction errors is used as the cost function of the adaptive weight. The expression is where e i represents the error values, and n is the sample size.
Step 3. The optimal weighted model is obtained by the APSO based on the adaptive inertia weight, and thus, an optimal ensemble forecasting model is constructed. The optimal weight is shown in Table 11: Step 4. The prediction accuracy of the APSO-RGM-SVR prediction model is evaluated by MAE, MAPE, and MSPE. and thus, an optimal ensemble forecasting model is constructed. The optimal weight is shown in Table 11: Step 4. The prediction accuracy of the APSO-RGM-SVR prediction model is evaluated by MAE, MAPE, and MSPE.

Results
The previous section introduced the data sources, described the detailed steps for constructing the four ensemble models, and estimated the parameters of each model. The calculation steps for applying the model for prediction are similar to those used to evaluate the fitting, and thus are not described here. This section introduces the dataset and presents the fitting results and evaluation of primary energy consumption by the single forecasting models, unoptimized ensemble models, and optimized ensemble models. Then, the forecast results of the future primary energy demand and primary energy structure are presented.

Data
The data used to construct China's primary energy demand and energy structure prediction models came from the National Bureau of Statistics of China (http://www.stats.gov.cn/tjsj/). We selected 12 years (2005-2016) of annual consumption data as a dataset, including China's total annual energy consumption, coal, crude oil, natural gas, and annual consumption of water, wind, and nuclear power, as shown in Table 12. Seventy percent of the dataset (2005-2012) was used as a model training set, and 30% of the sample (2013-2016) was used as the model test set. Figure 6 depicts the annual consumption of coal, crude oil, natural gas, water, wind, and nuclear power as well as China's total primary energy consumption. The growth rate of energy consumption is shown in Figure 7. From 2005 to 2016, China's total energy consumption showed an overall growth trend. During the period 2005-2011, primary energy demand showed rapid growth. For promoting economic development, the demand for primary energy increased steadily, especially the demand for coal. Oil and natural gas are affected by supply, and their growth rates were small. Under the influence of environmental policies, the total primary energy consumption growth slowed from 2011. The primary energy structure began to change significantly, and coal consumption growth began to slow down and decline. The proportion of clean energy began to increase.

Model Adaptability
The experimental data came from the total annual consumption of primary energy in China and the annual consumption of four types of primary energy for the years 2005-2016. The dataset was

Model Adaptability
The experimental data came from the total annual consumption of primary energy in China and the annual consumption of four types of primary energy for the years 2005-2016. The dataset was divided into two parts: data from 2005-2012 was used as model data, and data from 2013-2016 was

Model Adaptability
The experimental data came from the total annual consumption of primary energy in China and the annual consumption of four types of primary energy for the years 2005-2016. The dataset was divided into two parts: data from 2005-2012 was used as model data, and data from 2013-2016 was used as a test dataset. The prediction results of the prediction model including GM (1,1), RGM, ARIMA, SVR, GM (1,1)-SVR, RGM-SVR, ARIMA-SVR, and APSO-RGM-SVR are shown in Table 13. Then, the accuracy was evaluated by MAE, MAPE, and MSPE. As far as a single model is concerned, fitting results of SVR are better in the model training phase, followed by RGM and GM (1,1), while SVR had the highest prediction accuracy in the test phase. The optimization of the rolling mechanism makes the accuracy of RGM higher than GM (1,1). The ARIMA showed very poor prediction effect in both the training phase and the prediction phase. The ensemble model had better prediction accuracy than the single models. Compared with other unoptimized ensemble models, the RGM-SVR model had better fitting and prediction effects both in training and testing. However, the adaptive weight ensemble model APSO-RGM-SVR based on the APSO proposed in this study had higher prediction accuracy and stronger adaptability. The best evaluation results for each type of energy are indicated in bold in Table 13. Because the variation of consumption of each type of energy was different, the characteristics extracted by each prediction method during training were also different. From the prediction results of the total annual energy consumption, the RGM and GM (1,1) had the best fitting effect in the training phase. Because the modeling steps of RGM and GM (1,1) were the same in the training phase, the fitting results of the two models in the phase were basically the same. The results of the three evaluations, MAE, MAPE, and MSPE were 28.310, 0.00840, and 0.00384, respectively, which were the lowest values, followed by those of APSO-RGM-SVR. In the test phase, the values of the three evaluations of APSO-RGM-SVR were 26.252, 0.00616, and 0.00352, respectively, which were the lowest values. The fitting and prediction results of annual coal consumption show that the fitting errors of RGM and GM (1,1) were the smallest in the training phase, and the MAE, MAPE, and MSPE of the two models were 29.404, 0.01207, and 0.00516, and 2939.859, 0.01207, and 0.00516, respectively, i.e., they were approximately the same. In the test phase, the APSO-RGM-SVR model had the highest accuracy. The three error evaluations results of the APSO-RGM-SVR model were the lowest compared with other methods: 14066, 0.00505, and 0.00320. For the training and test results of annual oil consumption, SVR had the highest fitting degree to the training data. The results of the three error evaluations were 4.345, 0.00700, and 0.00651, respectively. In the test phase, the accuracy of APSO-RGM-SVR was still higher than that of the other models, and the results of the three error evaluations were 5.956, 0.00785, and 0.00448, respectively. For the results of the annual consumption of natural gas, the results of the MAE and MAPE evaluations in the training phase were analyzed, and SVR had the smallest error of 0.622 and 0.00549, respectively, indicating that the SVR model had the highest fitting degree. However, the analysis results of MSPE, APSO-RGM-SVR, and RGM-SVR error evaluations were the lowest-0.00512. Most noteworthy, the MSPE value was low, indicating that the fitting curve was relatively stable, and the abnormality was low. For a fitted curve with a large MAPE value, if it is relatively stable, its MSPE value is low. Therefore, the MSPE value cannot fully reflect the accuracy of the curve fitting. In evaluating the prediction model, it should be combined with MAE, MAPE, and MSPE to give a comprehensive evaluation. In the test phase, the three error evaluations of APSO-RGM-SVR-2.092, 0.01259, and 0.00995-were the lowest, showing an optimal prediction effect. For the training and test results of water, wind, and nuclear power consumption, in the training phase, it can be seen from the values of MAE and MAPE (7.912 and 0.02546, respectively) that the SVR errors were the smallest, indicating that the SVR model had the highest fitting degree. However, regarding the value of MSPE, the error evaluations of RGM and GM (1,1) (both were 0.01333) were the lowest. In the test phase, the APSO-RGM-SVR model had the best prediction capacity, and the three error evaluations were the lowest-10.947, 0.02024, and 0.01232, respectively.
The fitting and prediction curves for primary energy consumption are shown in Figure 8a-e. In the training phase, ARIMA had the worst fitting effect, and the fitting effects of the GM (1,1), RGM, SVR, and APSO-RGM-SVR model were ideal. However, in the test phase, the ensemble model showed its advantages, especially the APSO-RGM-SVR model, which had a high fitting degree to the real data.

Prediction Results
Using the sample information, the APSO-RGM-SVR model was selected to forecast China's total annual energy demand and annual demand for the four types of primary energy for the year 2020 and 2025. The prediction results are shown in Table 14. The total primary energy demand was

Prediction Results
Using the sample information, the APSO-RGM-SVR model was selected to forecast China's total annual energy demand and annual demand for the four types of primary energy for the year 2020 and 2025. The prediction results are shown in Table 14. The total primary energy demand was predicted to increase to 4465.41 mtce in 2020 and 4825.43 mtce in 2025. Compared with 2016, the demand for coal was predicted to decrease. It was estimated that the demand for coal will be 2637.23 and 2203.86 mtce in 2020 and 2025, respectively. The total demand for oil is growing slowly. It was estimated that the demand for oil will be 922.15 and 989.19 mtce by 2020 and 2025, respectively. In the future, the demand for clean energy was predicted to drastically. It is estimated that the demand for natural gas will be 409.53 mtce in 2020 and 540.64 mtce in 2025. Water, wind, and nuclear power were predicted to receive more and more attention in the future energy market. The demand for 2020 was predicted to be 687.50 mtce, and the demand for 2025 was predicted to be 1091.74 mtce.  Figure 9a shows the prediction results of energy for the years 2017-2025 and the integration and the comparison of the energy demand structure with the energy structure, for the years 2005-2017. Figure 9b shows the ratio of coal, oil, natural gas, water, wind, and nuclear power for the years 2005 and 2025. As can be seen from Figure 9, China's total primary energy demand has grown steadily. The demand for natural gas and water, wind, and nuclear power has risen sharply. In 2020, the ratio of natural gas was predicted to be about 8.8%. In 2025, the ratio of natural gas was expected to reach 11.2%. It was estimated that the ratio of water, wind, and nuclear power will reach 14.8% and 22.6% in 2020 and 2025, respectively. Coal and oil that cause pollution to the environment were predicted to gradually decrease. The ratio of coal will fall to 56.6% in 2020 and 45.7% in 2025. It was estimated that oil will account for 19.8% in 2020 and around 20.5% in 2025, with a very small increase.   Figure 9a shows the prediction results of energy for the years 2017-2025 and the integration and the comparison of the energy demand structure with the energy structure, for the years 2005-2017. Figure 9b shows the ratio of coal, oil, natural gas, water, wind, and nuclear power for the years 2005 and 2025. As can be seen from Figure 9, China's total primary energy demand has grown steadily. The demand for natural gas and water, wind, and nuclear power has risen sharply. In 2020, the ratio of natural gas was predicted to be about 8.8%. In 2025, the ratio of natural gas was expected to reach 11.2%. It was estimated that the ratio of water, wind, and nuclear power will reach 14.8% and 22.6% in 2020 and 2025, respectively. Coal and oil that cause pollution to the environment were predicted to gradually decrease. The ratio of coal will fall to 56.6% in 2020 and 45.7% in 2025. It was estimated that oil will account for 19.8% in 2020 and around 20.5% in 2025, with a very small increase.  Figure 10 shows the growth rates of annual demand for primary energy for the next nine years. Table 15 shows the average annual growth rate of primary energy demand. The results show that the total primary energy demand was predicted to grow steadily in the future, with an average annual growth rate of 1.14%. The annual growth rate of coal demand is negative, i.e., it is declining year by year with an average annual rate of 2.2%. Although the annual demand for oil has increased yearly, the growth rate has shown a downward trend, and the average annual growth rate is 2.42%. It is expected that natural gas and clean energy such as water, wind, and nuclear power will be fully developed. Although the growth rate is predicted to decline slightly after 2020, the annual demand is still predicted to showing a very high growth trend. The average annual growth rates of the two types of primary energy demand were predicted to be 7.43% and 7.7%.   Figure 10 shows the growth rates of annual demand for primary energy for the next nine years. Table 15 shows the average annual growth rate of primary energy demand. The results show that the total primary energy demand was predicted to grow steadily in the future, with an average annual growth rate of 1.14%. The annual growth rate of coal demand is negative, i.e., it is declining year by year with an average annual rate of 2.2%. Although the annual demand for oil has increased yearly, the growth rate has shown a downward trend, and the average annual growth rate is 2.42%. It is expected that natural gas and clean energy such as water, wind, and nuclear power will be fully developed. Although the growth rate is predicted to decline slightly after 2020, the annual demand is still predicted to showing a very high growth trend. The average annual growth rates of the two types of primary energy demand were predicted to be 7.43% and 7.7%.  Figure 10 shows the growth rates of annual demand for primary energy for the next nine years. Table 15 shows the average annual growth rate of primary energy demand. The results show that the total primary energy demand was predicted to grow steadily in the future, with an average annual growth rate of 1.14%. The annual growth rate of coal demand is negative, i.e., it is declining year by year with an average annual rate of 2.2%. Although the annual demand for oil has increased yearly, the growth rate has shown a downward trend, and the average annual growth rate is 2.42%. It is expected that natural gas and clean energy such as water, wind, and nuclear power will be fully developed. Although the growth rate is predicted to decline slightly after 2020, the annual demand is still predicted to showing a very high growth trend. The average annual growth rates of the two types of primary energy demand were predicted to be 7.43% and 7.7%.

Conclusions
Accurate forecasting of energy demand can help decision departments to develop reasonable policies and plans. At the same time, by forecasting energy demand and structure, the connections between energy consumption and economic, social, and environmental protection can be established to guide the involved departments to adjust the energy structure and industrial layout in a targeted manner. The purpose of this study was to develop an ensemble forecasting technology based on a time-series model and artificial intelligence, together with APSO to determine the optimal combination weight. The ensemble forecasting technology was optimized to predict China's primary energy demand and primary energy structure more accurately. Through the research results, we obtained the following three important findings: (1) The improved PSO algorithm with the "momentum" factor added had better global search capabilities. (2) By comparing a variety of single-prediction models and fixed-weight ensemble models, the prediction results of the APSO-RGM-SVR model proposed in this study showed the smallest error, and the prediction accuracy of energy demand was significantly improved. The ensemble model (APSO-RGM-SVR), which was optimized by the APSO algorithm, effectively combined the characteristics of a time-series model and artificial intelligence, and could adjust the optimal weight at any time with the change of samples. The new method proposed in this paper is a prediction method with higher accuracy and better generalization. (3) The APSO-RGM-SVR method was used to forecast China's primary energy consumption from 2017 to 2025. Prediction results indicate that China's energy demand will continue to rise. By 2020, China's primary energy demand was predicted to reach about 4656.41 mtce, and the proportions of coal, oil, natural gas, and hydropower and nuclear power were predicted to be 56.6%, 19.8%, 8.8%, and 14.8%, respectively. By 2025, China's primary energy demand was predicted to reach about 4656.41 mtce, and the proportion of coal, oil, natural gas, and hydro, nuclear, and wind power were predicted to be 45.7%, 20.5%, 11.2%, and 22.6%, respectively.
According to the evaluation results of the model training situation and predictive ability, it can be considered that the prediction model APSO-RGM-SVR proposed in this paper can be used for future short-term and medium-term primary energy demand forecasting and primary energy structure prediction in China, and it can produce better prediction results. At the same time, this method can also be used for the same types of data training and prediction in other regions and countries. However, whether the model proposed in this study is applicable to the prediction of large samples remains to be tested. This is also a direction that we need to explore further.