Prediction of the Energy Consumption Variation Trend in South Africa based on ARIMA, NGM and NGM-ARIMA Models

: South Africa’s energy consumption takes up about one-third of that in the whole African continent, ranking the first place in Africa. However, there are few researches on the prediction of energy consumption in South Africa. In this study, based on the data of South Africa’s energy consumption during 1998–2016, Autoregressive Integrated Moving Average (ARIMA) model, nonlinear grey model (NGM) and nonlinear grey model–autoregressive integrated moving average (NGM-ARIMA) model are adopted to predict South Africa’s energy consumption during 2017–2030. After using these NGM, ARIMA and NGM-ARIMA, the mean absolute percent errors (MAPE) are 2.827%, 2.655% and 1.772%, respectively, which indicates that the predicted result has very high reliability. The prediction results show that the energy consumption in South Africa will keep increasing with the growth rate of about 7.49% in the next 14 years. This research result will provide scientific basis for the policy adjustment of energy supply and demand in South Africa and the prediction techniques used in the research will have reference function for the energy consumption study in other African countries.


Introduction
South Africa's energy consumption ranks the first place in Africa, accounting for one-third of that in the whole African continent [1]. South Africa's energy development level not only can reflect its energy supply-demand status, but also will have an important influence on the overall energy development level in Africa. The prediction of South Africa's future energy consumption will be beneficial to understand South Africa's future energy supply-demand level from another aspect. Further, the mastering of future supply-demand development conditions will be good for the South African government in adjusting the balance between supply and demand so as to enhance the South Africa's economic development level and people's living standards. Furthermore, it will be of reference significance for the energy policy-makers and market participants in other African countries.
Energy, as one of the key driving forces for economic growth, has always been considered by experts and scholars [2]. In recent years, the economy in Africa has enjoyed rapid development, and scholars have paid more attention to the energy demand and consumption in Africa [3]. However, among the present researches in the energy field, few people have focused on research in South Africa. From the territory and energy types that scholars have considered, Oyedepo et al. [4] predicted the energy demand in Nigeria; their results show that the energy demand in Nigeria will increase rapidly with geometric progression and the demand will be huge. According to the result, the authors suggested that the Government of Nigeria to vigorously exploit renewable energy. Adom et al. [5] used two econometric methods (autoregressive distributed lag (ARDL) model and partial adjustment model (PAM)) to forecast the total electricity demand of Ghana. The results show that positive output, urbanization and income effects far offset the negative efficiency effects that lead to short-term and long-term electricity consumption growth. From 2012 to 2020, the level of domestic power consumption will increase infinitely in the future. Bazilian et al. [6] applied long-term prediction to the electric power demand in sub-Saharan African areas, and the results show that the installed electricity capacity in 2030 will be increased by 3 times in 2020 and that the electric power demand will be increased continuously. Mentis et al. [7] made an evaluation of the wind energy potential in Africa and discovered that the wind energy potential in South Africa, Sultan, Egypt and Nigeria is large and the annual wind energy output is high; while that in Equatorial Guinea and Gabon is low. In addition, there are also relevant researches about the energy in South Africa. Sigauke et al. [8] have used regression-seasonal autoregressive integrated moving average-generalized autoregressive conditional heteroskedastic (Reg-SARIMA-GARCH ) model to predict the daily peak electricity demand in South Africa, and the results show that the prediction error is 1.42%. Thopil et al. [9] have predicted the water consumption of coal-fired power generation, and the results show that it will be decreased by 14% till 2021. Tsikata et al. [10] analyzed the challenge that the electric power supply in South Africa will face and advocated more diversified energy environment. Ayodele et al. [11] analyzed the wind conditions in 10 coastal regions in South Africa and researched the potential of wind power generation. Walwyn et al. [12] also analyzed and researched the new energy plan in South Africa.
In addition, to enhance the accuracy of energy prediction, scholars have also used different research methods [13]. For example: Ayodele et al. [14] used an artificial neural network (ANN) to predict the wind speed in Western Cape Province in South Africa, and the mean absolute percent error (MAPE) was 6.64%. Sotomane [15] has applied short-term prediction to the electric power condition in Maputo City, Mozambique, and the results show that higher degree of accuracy can be achieved by using multiple models. Inglesi [16] used the Engle-Granger method to create cointegration and error correction models to forecast South Africa's electricity demand. On this basis, variables that have influence on electricity demand were analyzed. The results show that, in the long term, electricity demand is affected by price as well as economic growth/income. In the short term, electricity consumption is affected by population growth. Ahjum et al. [17] used the South African TIMES (or SATIM) model to study the hydraulic energy in various areas of South Africa based on economic and commodity price. Fadare [18] used ANN to predict the solar power generation in Nigeria and discovered that the relevance between the prediction results and solar radiation intensity is as high as 90%. Bessa et al. [19] proposed a new space-time prediction method based on the vector autoregressive framework, which can predict the level of residential solar photovoltaic and medium-and low-voltage (MV/LV) substation six hours in advance. Ceci et al. [20] presented a new method of artificial neural network. This method performs on-line adaptive training and enriches the entropy measure with the spatial information of data to consider the spatial autocorrelation. Adbel-Nasser et al. [21] used long short-term memory recurrent neural network (LSTM-RNN) to accurately predict the output power of a photovoltaic system. Through sorting the above literature, it can be seen that the present studies on African energy mainly focus on electric power and clean energy; there are few studies about the total energy consumption. In addition, existing studies mostly choose neutral network and other models as the research method. However, few studies concurrently adopt multi-time-sequence prediction model to make a comparative prediction of the energy in Africa. Based on this condition, there are two innovation points in this research: (1) With the energy consumption in South Africa as research object, models have been used to predict the future energy data in 2017-2030. Based on the result, it has provided reliable data support for the optimization and adjustment of energy policy in South Africa so that the policy made by energy managers will be truly beneficial to the long and sustainable development of South Africa's energy market. (2) Several linear-and nonlinear-combined time sequence models have been simultaneously used to predict the energy consumption in Africa: autoregressive integrated moving average model (ARIMA), nonlinear grey model (NGM) and nonlinear grey model-autoregressive integrated moving average (NGM-ARIMA). First, ARIMA model is used to predict the research object from the perspective of linear prediction [22]. NGM model is to carry out prediction from the perspective of nonlinear prediction. NGM-ARIMA model is applied to research by using the combined linear and modified nonlinear model. The prediction by simultaneously using these three methods will comprehensively reflect the energy consumption tendency in South Africa so that the prediction will be more accurate. Second, NGM-ARIMA is a combined model based on the NGM and ARIMA models; this new combined model will comprehensively reflect the advantages of these two single models and will further enhance the accuracy during the application process [23].
The framework of the remaining sections of the article is as follows: Section 2 is the introduction of linear model (ARIMA model), nonlinear model (NGM model) and combined model (NGM-ARIMA model); Section 3 is the presentation and discussion of the research results; Section 4 is the summary of the article.

Research Method and Data Source
In this research, NGM-ARIMA model, ARIMA model and NGM model have been used simultaneously to predict the energy consumption in South Africa. In this section, the three research methods have been mainly sorted and summarized.

ARIMA Prediction Model
The ARIMA model is to regard the prediction value as the function determined by time sequence [24]. Once the model is recognized, the future data can be predicted by the past data and present time sequence data [25]. Considering the aspect of prediction characteristics, on one hand, the ARIMA model can model according to the characteristics of the time sequence; on the other hand, this prediction model has already considered the unstable factors of data during modeling process. Therefore, the ARIMA model has the advantages of simple structure, fast modeling speed and high prediction accuracy [26][27][28]. Considering the categories contained by model, the ARIMA (p, d, q) model can be decomposed into three parts: (1) autoregressive (AR) model; (2) moving average (MA) model; (3) autoregressive integrated moving average (ARIMA) model [29]. Usually the researchers will use different types of ARIMA model according to the different characteristics of data sequence.
The building of an ARIMA model usually needs the following three steps: Step 1: Take stationary processing to nonstationary sequence. If the data sequence is nonstationary and has certain increase or decrease tendency, difference computing shall be taken to data. Define the raw data sequence as: , , m y y y ; the first-order accumulated sequence The specific difference formula is as follows: Y represents the sequence after stationary and d represents the order of difference; t Y represents the nonstationary sequence before difference. In addition, B is the following matrix: Step 2: Draw autocorrelation function (ACF) and partial autocorrelation function (PACF) to the stationary time sequence and judge the p and q values according to the truncation and trailing nature of the function, in which truncation refers to the nature that the ACF or PACF of a time sequence is 0 after some order; trailing is the nature that the ACF or PACF slowly shrinks to 0. The specific judgment rule is that if the PACF of stationary sequence exhibits truncation and the ACF is trailing, then p equals the truncation order, q equals 0 and it can be judged that the sequence fits for AR model; if the PACF of stationary sequence is trailing and the ACF exhibits truncation, then q equals the truncation order, p equals 0 and it can be judged that the sequence fits for the MA model; if both the ACF and PACF of stationary sequence are trailing, then p equals the PACF truncation order, q equals the ACF truncation order and the model fits for ARMA model.
ACF measures the correlation between yt and yt-k, but this correlation is not a pure one. As we know, yt is affected by yt-1, yt-2,..., yt-k+1, and the k-1 variables are correlated with yt-k at the same time, so the effects of other variables on yt and yt-k are brought into the autocorrelation coefficient [30]. In order to measure the effect of yt-k on yt, the concept of PACF is introduced, which measures the correlation between yt and yt-k after removing the effects of yt-1, yt-2, yt-3.
Step 3: Obtain the prediction equation. If the model is an AR process, then the interpreted variable can be taken as function of pre-stage data and current-stage data or can be called their regression. Then, the prediction formula can be represented as: If the model is an MA process, then the interpreted variable can be taken as the function of current-stage error and previous stages' error terms, and it can be represented by a mathematical model as: u is the error term and * t X is the interpreted variable.

NGM Prediction Model
The nonlinear grey model is an improved model, based on the traditional grey model [31]. In a realistic prediction process, many data sequences are not mere increasing or decreasing processes. The interference of random factors will often happen in the data and it will cause the final time sequence to present local nonlinear characteristics [32]. The proposal of a nonlinear grey model may make traditional linear model more adaptive, and the continuous adjustment of nonlinear coefficients will also enhance the prediction accuracy [33].
The biggest difference between nonlinear grey model and linear grey model is the core differential equation. The addition of power coefficient has made the traditional model become a nonlinear model. In recent years, high accuracy has enabled it to be widely applied in numerous fields for prediction [34,35].
The calculation steps of the model are as follows: Step 1: Preprocess the data. We give the first five values of raw data and name the sequence as , , , Later, according to cumulative principle, we take once cumulative processing to the sequence and name the obtained sequence as: For the convenient operation, we have introduced an auxiliary sequence, defined as: Step 2: Obtain the core differential equation of nonlinear grey model based on the three sequences in preprocessing: where a and b are the coefficients of the differential equation.
Step 3: Calculate the coefficient value of the differential equation. In above equation, the unknown coefficients a and b can be obtained through least square method. We give the following matrix: where: The power coefficient α can be obtained through the formula for solution of four-order-Lingo: (11) Step 4: Calculate the final prediction value. Through above calculation, the final order prediction cumulative sequence k can be solved:

NGM-ARIMA Prediction Model
NGM-ARIMA model combines the NGM and ARIMA models. It has the combined advantages of linear and nonlinear models without the disadvantages of linear and nonlinear models. The ARIMA model has the advantage of simple modeling and high prediction accuracy, but it has high data stability requirements and it cannot capture the nonlinear relation of data, while NGM model can capture the nonlinear relation and has better adaptability in data processing. So, the predicted data still have high accuracy when there is a large variation in raw data.
In terms of model innovation, the NGM-ARIMA model proposed in this study is based on the new principle of "error correction + secondary modeling". First of all, under the guidance of the error correction principle, the NGM-ARIMA model puts the initial error predicted by NGM into the input variables of the ARIMA model. This is expected to stabilize the error processing and make the final fitting curve closer to the reality. Second, guided by the principle of secondary modeling, NGM-ARIMA model chooses two kinds of time series models, and combines them in the link. The connection between the models has a certain replicable value in the combination model. Moreover, this combination can bring the advantages of the two models into play and avoid the corresponding defects. The principle of quadratic modeling is expected to make the best of the prediction process through two models.
The operation steps of the NGM-ARIMA model can be roughly divided into three steps: step 1 is to combine the raw data with the results predicted by the NGM model and obtain the residual; step 2 is to use the ARIMA model to analyze the residual and obtain the new residual through calculation; step 3 is to add the prediction value obtained from the NGM model with the new residual obtained from the ARIMA model to get the final prediction value of the NGM-ARIMA model. The specific steps are shown in Figure 1.

Characteristics and Limitations of Each Model
On the basis of NGM and ARIMA single models, a new combined NGM-ARIMA model is constructed according to the principle of "secondary modeling + error correction". Among them, the NGM model is to predict the initial data sequence, while the ARIMA model is to correct the error again. This modeling idea and process of the composite model brings the following characteristics.
First of all, the model only forecasts according to the historical change track of the data sequence itself. In other words, the application object of the model is only single factor time series data. Secondly, the model constructed in this study is a combination of prediction steps after superposition. Compared with the traditional improvement of the internal coefficient, this improvement between the models has a wide range of application value. As long as the latter model has the effect of stable error, the two models can be combined.
Everything has its pros and cons. The above two features not only provide the core of model construction, but also bring corresponding limitations. Since the two basic models of the composite model are all time series models, the composite model constructed in this study cannot consider the influence of other external multiple factors on the predicted data. This defect leads to the following limitations: the model constructed in this paper can only predict the data with its own historical trend but cannot measure the impact of other external factors on the data itself. Although the multifactor prediction model can fully consider the influence of all aspects, there is also the problem of error accumulation of all factors [36]. In general, the limitation of the model proposed in this study is that it cannot accurately measure the data affected by multiple factors. However, it has a good accuracy in predicting the data that are obviously affected by the historical laws.

Data Source
Considering the reliability and acquirability of research data, the data used for the research are from BP energy statistical yearbook [1], and 1998-2016 South Africa's energy consumption has been selected as raw data to predict the South Africa's energy consumption in 2017-2030.

Operation Process of the Three Models
In this part, the operation process of the ARIMA, NGM and NGM-ARIMA models will be divided into fitting and forecasting. The fitting part is to predict the South Africa's energy consumption in 1998-2016 and compare with the raw data from the same period, and the purpose is to inspect the accuracy of these three prediction models. The MAPE formula used during the operation process is to compare the tools of raw data and fitting data accuracy so as to discover whether these three models are suitable for the research. The forecasting part is to predict the South Africa's energy consumption in 2017-2030, and the results will be South Africa's energy consumption in the next 14 years.

ARIMA Model Fitting Process
The prediction of the ARIMA model is classified into following several steps: Step 1: Apply unit root test to raw data. The unit root test is used to make the prediction data stationary through a differential tool, and the differential order is the d value in ARIMA (p, d, q) equation. Table 1 shows the result after second-order difference. As shown in the table, the t-value is between 5% and 10%, and the p-value is 0.0665 (<0.1), which means that the sequence tends to stationary under 90% confidence, therefore, d = 2. Step 2: After determining that d = 2, the p and q values can be obtained through autocorrelation coefficient. Figure 2 is the correlation coefficient chart obtained after second-order difference. It can be seen from the chart that the partial correlation coefficient chart tends to stationary after fifth-order difference and the autocorrelation coefficient chart tends to stationary after first-order difference, therefore, it can be obtained that p = 5, q = 1. After determining that ARIMA (p, d, q) is after ARIMA (5, 2, 1), we used SPSS software to apply the next operation to the model; the parameter values obtained are shown in Table 2. It can be seen from Table 2 that the stationary R 2 -value is 0.523, which conforms to the goodness test standard. The final fitting results of 1998-2016 obtained through using ARIMA (5, 2, 1) are shown in Figure 3, and the difference between fitting value and actual value can be clearly seen in this figure.  Figure 3, it can be seen from the variation trend of raw data and ARIMA prediction data that though there is slight difference between the raw data and prediction value in 2003 and 2004, the overall variation trend is basically consistent. In addition, the fitting data show that, except for the accuracy in 2003 and 2004 being 92%, the accuracies of other years are all higher than 95%, which also means that ARIMA (5, 2, 1) is applicable to the study.

NGM Model Fitting Process
First, we substituted the power coefficient into the operation formula of NGM in Section 2, which was then calculated using MATLAB software; the obtained power coefficient of nonlinear grey model is shown in Table 3. Based on the power coefficients in Table 3, and using MATLAB software(version 9.3, MathWorks, Natick, Massachusetts, USA), the final South African energy consumption predicted by NGM model was obtained, as shown in Figure 4.  Figure 4, it can be seen that the variation trend of raw data and NGM prediction data during 1998-2016 is similar, with the degrees of fitting in 2005-2007 and 2011-2014 reaching 100%. The fitting data show that the accuracy of data fitting by NGM model is high, which also indicates that NGM model is applicable to the research.

NGM-ARIMA Model Fitting Process
The NGM-ARIMA combined prediction model uses a linear model to correct the data residual value predicted by a nonlinear model.
First, we subtracted the NGM prediction value and primary energy consumption to obtain the residual.
Then, we used the ARIMA model to apply the unit root test to residual values, discovering that data sequence was stationary after first-order difference during inspection process; therefore, d = 1.
After determining the d value, we could obtain the p value and q value through analyzing the autocorrelation coefficient and partial correlation coefficient in the correlation coefficient figure.
We substituted the ARIMA (3, 1, 5) into SPSS software, and the result obtained was the error of NGM. Then, we combined the result with primary energy consumption; the obtained data are South Africa's energy consumption predicted by NGM-ARIMA model. The specific prediction data are shown in the figure below:  Figure 5, if we compare the primary energy consumption in 1998-2016 with the prediction data by NGM-ARIMA, we can see that the trendlines of two groups of data are highly similar. Through analyzing the fitting data, we can discover that the fitting by using NGM-ARIMA model is as high as 98%, in which the fitted values in 2001 and 2007 are 100%. It also indicates that the data accuracy obtained through using linear and nonlinear combined model is higher than that of using single model.

Model Goodness Inspection
In this part, the prediction results obtained by ARIMA, NGM and NGM-ARIMA models will be evaluated by three different statistical indicators, to better test the goodness of fit. These indicators are mean absolute percent error (MAPE), mean squared error (MSE) and mean squared percent error (MSPE). The calculation formulas of these indicators are as follows:  (16) where yi is the prediction value, xi is the actual value and n is the sample size [37]. Table 4 shows the three error values of these three models obtained through the above formulas, in which, the last line of the table exhibits the mean value of each error, which can also be used as the final measurement results of the error term.   Table 4 and it clearly shows the accuracy of data in 1998-2016 predicted by using ARIMA, NGM and NGM-ARIMA model. Just as shown in the figure, though the accuracy of 2003-2004 prediction data by ARIMA model and 2002 by NGM model is slightly lower than that of other years, the mean value of accuracy predicted by these three models is higher than 97%. Through the measurement tools for prediction accuracy of the above models, we can see that: (1) The MAPE of these three models is less than 3%; in which, the MAPEs of ARIMA model, NGM model and NGM-ARIMA model are 2.827%, 2.655% and 1.772%, respectively. (2) The spider web chart indicates that, except for a mark of 92% reached by ARIMA model accuracy in 2003 and 2004 and by NGM model in 2002, the prediction accuracy of other years exceeded 95%. Based on the two measurement results above, we can make sure that the three models are applicable to the research, that the predicted data credibility is high, and the prediction result is of high reference value. Table 5 shows South Africa's energy consumption in 2017-2030 predicted by using ARIMA, NGM and NGM-ARIMA models. Figure 7 is the variation trend chart of consumption drawn according to the prediction result of Table 5, where the change of South Africa's energy consumption can be seen more clearly.  From Figure 7, we can see that the degree of fitting of three models is very high in previous prediction and primary energy consumption. In addition, South Africa's energy consumption in 2017-2030 predicted by these three models presents stable growth tendency.

Prediction Results and Discussion
Specific to the prediction result of the research, we will discuss the following two aspects: Firstly, from the analysis of prediction model, the energy consumption tendency predicted by the two models is consistent and the error is less than 3% after the detection of MAPE. It also has proved that the prediction result has high credibility. However, since these three models are time sequence models, the impact of other factors to South Africa's energy consumption have not been considered during modeling process, it may have the incomplete analysis of energy consumption.
Secondly, from the analysis of prediction result, South Africa's energy consumption in 2030 will be increased by 27.19 Mtoe than that in 2016. In addition, it can be seen clearly from Figure 7 that South Africa's energy consumption will keep increasing for a long time in the future. Under the circumstance that the energy reserve is unchanged, it will undoubtedly cause imbalanced energy supply and demand; Therefore, South Africa should guarantee the energy supply while increasing the energy consumption.

Conclusion
This paper aimed to accurately predict the South Africa's energy consumption in 2017-2030 by using a linear model (ARIMA), a nonlinear model (NGM) and a linear-nonlinear combined model (NGM-ARIMA) in time sequence modeling. The research results show that the MAPEs of each model are 2.827%, 2.655% and 1.772%, respectively, which indicates that the three models are applicable to the research. The accuracy predicted by the NGM-ARIMA model is the highest, and the prediction result is the closest to the actual energy consumption. This means that the research could provide scientific and reliable data support for the prediction of South Africa's energy.
By calculating the prediction result, we can see that the South Africa's energy consumption in 2017-2030 will keep increasing with about 7.49% growth rate and it will be increased by 17.8% in 2030 compared to that in 2016. In other words, South Africa's energy consumption in the next 14 years will still exhibit sustainable and stable growth; this result is consistent with the rapid economic development of South Africa [38].
This prediction result means that in the future, the South African government needs to pay more attention to the change of energy supply and demand so as to guarantee the balance between energy supply and demand and thus lay a foundation for promoting South Africa's rapid and stable economic development and people's living standard. In addition, the prediction technique of the research will be of reference significance to the research of energy consumption in other African countries and regions.