4.2. Error Metric and Statistic Test
The chosen error metric should reflect the forecasting performance of the models from different aspects. In view of this, three indicators were adopted herein: Root mean square error (RMSE) and mean absolute percentage error (MAPE). These indicators have been widely utilized in recent years:
where
n is the size of the predictions, and
yt and
represent the observed value and predicted value, respectively.
Besides, to test the difference in the model prediction performance from the statistical aspect, the Diebold–Mariano (
DM) statistical test was introduced to determine whether the prediction accuracy of model A is significantly better than that of model B. The null hypothesis of
DM is that the prediction accuracy of model A is no greater than that of model B, that is, the prediction error
of model A is greater than or equal to the prediction error
of model B [
67]. Accordingly, DM test statistics can be given by Equations (25)–(27):
The unilateral test of DM statistics can effectively identify the superiority of model A over model B according to the DM statistics and its p value.
4.3. Parameter Settings
To test the superiority of the proposed framework, six forecasting base models and one simple average ensemble model were established and applied as benchmarks. First, the ARIMA and SES models that exhibit better performance for small-sample nonlinear time series were selected. Then, five emerging machine learning models, i.e., SVR, BPNN, GRNN, RBFNN, and ELM models, were adopted.
For the ARIMA model, the autocorrelation function (ACF) and partial autocorrelation function (PACF) are widely used to determine the orders of the autoregressive and moving averages process [
68]. ACF measures the correlation between time-series and its lags while PACF are the correlation coefficients between time-series and its lags without the influence of members in between. The lags of the autoregressive process are usually determined by the PACF diagram while the lag value of the moving averages process is determined by the ACF diagram. In this paper, parameters
p and
q were all set to be 5 according to the ACF and PACF diagrams. As the second-order difference of the original sequence is stable, d was set as 2. As for the SES model, the smoothing factor
α was set to 0.7.
For artificial neural networks, it is widely accepted that a feed-forward network with one hidden layer and enough neurons in the hidden layers can fit any finite input-output mapping problem [
64]. Therefore, the BPNN was set as a standard three-layer neural network, including an input layer, one hidden layer, and output layer [
67]. The number of hidden layer nodes was set to 20 according to Zhao et al. [
67] as a small number of hidden nodes results in an inaccurate fitting and too much results in local optimums. There are four layers, i.e., input layer, pattern layer, summation layer, and output layer, respectively, in the GRNN model. The number of nodes in the input layer is equal to the dimensions of the input vector. The spread of the radial basis function in the RBF model was set to 1, and the number of input neurons was equal to the number of columns in the data matrix.
Also according to Zhao et al. [
67], Godarzi et al. [
69], and Yu et al. [
70], the kernel function was set to the Gaussian kernel function, and C and gamma were respectively set as iqr(Y)/1.349 and 1, where iqr(Y) is the interquartile range of the processed target series. Besides, the number of neurons in the hidden layer was equal to the number of training samples in the ELM model. The sigmoid function was selected as the activation function.
For the ABC ensemble algorithm, the size of the bee colony was set to 100, and the maximum number of loop iterations was 1000. All of the mentioned forecasting models were run 20 times by using MATLAB R2016a software: (2016a, MathWorks, Natick, MA, USA). The average results are shown in the following sections.
4.4. Forecasting Error and Statistical Test
In this section, according to the error metric criteria (i.e., RMSE and MAPE), the forecasting error of out-of-sample data for eight benchmark forecasting models and the established model is shown. Then, the DM test was conducted for testing the significance level of the difference between any two models.
Figure 8a plots the actual values of the energy demand (line) and the forecasted values (bar) of each model, and
Figure 8b shows the absolute errors of each forecasting model. From the diagram of the prediction fitting curve (
Figure 8a), the benchmark models and the proposed ensemble forecasting model are effective, reflecting that the selected models are rational. In addition, the out-of-sample prediction performance of the GRNN and RBFNN models is not very good. Such a conclusion can also be verified from
Figure 8b. The absolute errors of GRNN and RBFNN fluctuate considerably. Compared to all benchmark models, the ensemble model proposed herein exhibits the smallest fluctuation of the absolute prediction error (
Figure 8b).
For the in-depth analysis of the prediction performance, two evaluation indexes were calculated.
Table 4 summarizes the performance of each model, including RMSE and MAPE, as well as the mean value of the statistics.
First, by the comparison of the base prediction models, including SES, ARIMA, SVR, and ANN, the ELM model exhibits the best prediction performance probably because the parameters do not need to be adjusted in the training process for the use of the ELM algorithm, and the unique optimal solution can be obtained by setting the number of neurons in the hidden layer. In addition, the ELM algorithm exhibits advantages of a rapid learning speed and good generalization performance. For the remaining base models, ARIMA exhibits better performance than SES, SVR, BPNN, GRNN, and RBFNN on the part of RMSE and MAPE, reflecting that ARIMA can forecast small-sample time series.
Once the ensemble prediction model is considered, the prediction model (the ensemble model with the ABC algorithm, E_ABC) proposed herein exhibits the best prediction performance. This model exhibits the lowest RMSE and MAPE. The comparison of the two integrated models revealed that E_ABC exhibits a better performance than E_AVE with respect to the criteria of RMSE and MAPE. For example, the RMSE values for E_ABC and E_AVE are 9.46 and 25.52, respectively. The RMSE of model E_ABC is less than half that of model E_AVE. Meanwhile, the MAPE values for E_ABC and E_AVE are 0.21% and 0.51%, respectively. The MAPE of E_AVE is considerably greater than that of E_ABC.
Besides, the prediction performance of the simple average ensemble model (E_AVE) is worse than that of some base prediction models (i.e., ELM and ARIMA model), possibly because the simple average ensemble method ignores the correlation between base models, and the number of base models is relatively small. The ensemble model with the ABC algorithm (E_ABC) makes use of the prediction results of each base model to train the model and to minimize the absolute error. The optimal training results can be obtained without considering the correlation between base models.
Finally, the results revealed that the integrated model E_ABC exhibits the best performance due to its lowest RMSE and MAPE. Hence, the integrated model E_ABC is powerful for energy demand forecasting.
Table 5 summarizes the empirical results from the DM test, as well as the
p values of the relevant statistics between any two models. For explanation purposes, in
Table 5, the
p value in row 2, column 2 is 0.0081 and less than 0.01, indicating that the test rejects the null hypothesis (i.e., there is a significant difference between the forecasted results of ARIMA and SES) at a 99% confidence level.
By focusing on the DM test in
Table 5, in the last row, the
p values represent the statistical test results of the proposed ensemble model (E_ABC) and other benchmark models. The
p values are all less than 0.1, indicating that there is a significant difference between the forecasting results of the integrated model E_ABC and those of eight benchmark models at a 90% confidence level. For example, the
p values in row 9, columns 2, 3, 4, and 7 are less than 0.01; hence, the proposed model is better than SES, ARIMA, SVR, and RBFNN at a 99% confidence level. Considering the forecasting accuracy of all models in
Table 5, the forecasting accuracy of the integrated model E_ABC is considerably better than that of the above-mentioned eight models from the statistical perspective. Generally, the integrated model E_ABC is confirmed to exhibit good performance for energy demand forecasting according to the error metric and DM statistic test.
Except for the forecasting accuracy comparison with the benchmark models listed in
Table 4 and
Table 5, this paper compares the experimental results with other related works [
30,
31,
32,
33,
35]. As presented in Equation (24), the value of MAPE eliminates the magnitude of the predicted target and can directly reflect the prediction accuracy of the model.
Table 6 presents the MAPE value of the various energy demand forecasting experiments (including different forecasting target and data set). The MAPE of the E-ABC model is the lowest compared with the others.
4.5. Future Energy Demand Forecasting Results
In this section, the integrated model E_ABC with a good prediction performance was applied to forecast the future energy demand of China. However, the out-of-sample information was required to forecast the energy demand from 2018 to 2022.
According to the 13th Five Year Plan proposed by the State Council, the economic growth rate will be greater than 6.5% by 2020, which is authoritative and representative to a certain extent. Thus, in this study, a GDP growth rate of 6.5% was used to calculate the GDP from 2018 to 2022. With respect to the other influencing factors, all of them belong to small samples; hence, the trend extrapolation method typically performs better than the others. In view of this, we applied the trend extrapolation and rolling regression methods to predict the data of each influencing factor for the next 5 years in this paper (
Table 7).
Table 8 summarizes the forecasting results and growth rate for the proposed ensemble model (E_ABC) in this study, and
Figure 9 summarizes the growth rate and energy demand, including the actual and forecasted data. The forecasting results revealed that the energy demand of China will maintain a steady growth trend. By 2022, the energy demand of China will reach 3429 million tons of the standard coal equivalent, corresponding to an increase of 9.48% compared to the energy demand in 2017 and an average annual growth rate of 1.897%. With respect to the growth rate of the future energy demand, the forecasted results revealed a steady fluctuation trend: The growth rate increases to 3.02% in 2018 and then decreases to 1.38% from 2019 to 2022. From
Figure 9, the growth rate of the energy demand is clearly leveling off gradually.
4.6. Discussion
By comparing the forecasting performance of the two categories models (single model and ensemble model), the ensemble models were verified to perform the best in forecasting energy demand, since the ensemble model achieves the highest forecasting accuracy in terms of MAPE and RMSE, which further supports that the ensemble model can be chosen as a powerful algorithm in forecasting energy demand.
When two ensemble models were compared, the results indicated that the prediction model (E_ABC) proposed herein exhibits the best prediction performance, which exhibited the lowest RMSE and MAPE. Meanwhile, the MAPE values for E_ABC and E_AVE are 0.21% and 0.51%, respectively. The MAPE of E_AVE is considerably greater than that of E_ABC. The ensemble model with the ABC algorithm (E_ABC) makes use of the prediction results of each base model to train the model and to minimize the absolute error.
Furthermore, the forecasting accuracy of the integrated model E_ABC is considerably better than that of the above-mentioned eight models from the statistical perspective. Generally, the integrated model E_ABC is confirmed to exhibit a good performance for energy demand forecasting according to the error metric and DM statistic test.
In summary, according to the above experiments analyzed in
Section 4.4 and
Section 4.5, three main results can be drawn as follows: (1) The ensemble model established in this study is significantly superior to some other benchmark prediction models just in terms of the forecasting accuracy and hypothesis test; (2) the proposed ensemble approach with the ABC algorithm can be employed as a promising framework for energy demand forecasting in terms of the forecasting accuracy and hypothesis test; and (3) the results for the forecasting of the future energy demand by the ensemble model revealed that the energy demand of China will maintain a steady growth trend.