Modeling and Forecasting Medium-Term Electricity Consumption Using Component Estimation Technique

The increasing shortage of electricity in Pakistan disturbs almost all sectors of its economy. As, for accurate policy formulation, precise and efficient forecasts of electricity consumption are vital, this paper implements a forecasting procedure based on components estimation technique to forecast medium-term electricity consumption. To this end, the electricity consumption series is divided into two major components: deterministic and stochastic. For the estimation of deterministic component, we use parametric and nonparametric models. The stochastic component is modeled by using four different univariate time series models including parametric AutoRegressive (AR), nonparametric AutoRegressive (NPAR), Smooth Transition AutoRegressive (STAR), and Autoregressive Moving Average (ARMA) models. The proposed methodology was applied to Pakistan electricity consumption data ranging from January 1990 to December 2015. To assess one month ahead post-sample forecasting accuracy, three standard error measures, namely Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Root Mean Square Error (RMSE), were calculated. The results show that the proposed component-based estimation procedure is very effective at predicting electricity consumption. Moreover, ARMA models outperform the other models, while NPAR model is competitive. Finally, our forecasting results are comparatively batter then those cited in other works.


Introduction
Electricity is a key component for the growth and development of any country's economy. It is a highly flexible form of energy that practically fuels the performance of each sector of an economy. It is a basic requirement of modern human life, bringing benefits and development in different sectors including healthcare, transportation, industries, mining, broadcasting, etc. [1]. Generally, electricity demand is an indication of the performance of a country's economy as electricity demand is integrated with all phases of development. Therefore, electricity demand forecast is essential for power system management, scheduling, operations, and capability evaluation of networks. In practice, however, electricity demand forecasting remains challenging for researchers as many factors directly or indirectly influence electricity consumption over the time [2][3][4].
Generally, electricity load or price forecasting is divided into three categories with respect to time scale: short term generally refers to forecasts from a few hours to a few days ahead; medium term is used for forecasting of few weeks to few months ahead; and long-term forecasts generally cover forecasts from a few months to years ahead [5]. Short-term electricity load forecasting is essential for the control and programming of electric power systems and also required by transmission companies when a self-dispatching market is in operation [6]. Medium-and long-term forecasts are also important for energy systems. For example, the medium-term electricity demand forecast is required for electric power system operation and scheduling [7][8][9], whereas the long-term electricity demand forecasting is crucial for capacity scheduling and maintenance planning [10].
It is well known that electricity demand time series exhibit specific features. The monthly electricity demand time series may have a more or less cyclic behavior and a long-term trend. Electricity consumption is extremely effected by weather and social factors that generally reflect in the demand time series [9,11]. Economic indicators commonly influence the consumption series trend, while climate changes introduce a periodic behavior in the series. The medium-term electricity demand forecasting generally deals with monthly data points, which often include a long-run (trend) component, as well as yearly and seasonal periodicities. For example, Figure 1  Previously, many researchers have worked on medium-term electricity demand forecasting that generally ranges from one month to a few months ahead using different methods, including time series, regression, artificial intelligent, genetic algorithm, fuzzy logic, and support vector machine [12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31]. Times series models are easy to implement and have been commonly used for electricity load forecasting in the past. For example, Yasmeen and Sharif [32] used different linear and non-linear time series models, namely AutoRegressive Integrated Moving Average (ARIMA), Seasonal-ARIMA, AutoRegressive conditional heteroscedasticity model (ARCH), and its generalized form, GARCH, to forecast medium-term electricity demand. Electricity demand series often contain non-linearity and, hence, non-linear models can produce better forecasts. For example, Al-Saba and El-Amin [33] forecasted one year ahead electricity consumption for Pakistan using classical time series models, namely Autoregressive (AR) and ARMA models, as well as Artificial Neural Network (ANN). Some authors compared the classical time series and regression models. For example, Abdel-Aal and Al-Garni [34] used multiple regression model and compared it with seasonal and non-seasonal ARIMA models. Economic and weather variables strongly influence electricity demand. To account for these effects, Nawaz et al. [21] studied Pakistan's annual electricity consumption with the help of economic variables. They forecasted electricity demand up to 10 years ahead using Smooth Transition Auto-Regressive (STAR) model. Many researchers compared time series, regression, and computational intelligence models [16,18]. Electricity load can be effected by temperature. To see this, Ali et al. [35] studied the effect of monthly temperature on electricity demand in Pakistan.
The results indicate that there was moderate linear correlation (r = 0.412) between mean temperature and electricity demand. On the other hand, several authors combined the features of two or more than two models and proposed a new model which is often referred as hybrid model [27,30,36,37]. For example, Alamaniotis et al. [13] proposed a hybrid model by combining the features of machine learning tools (kernels) with vector regression model. For medium-term demand forecasting, Ghiassi et al. [38] proposed a hybrid model that combines the neural networks model with expert systems. Several other techniques have been also used to forecast electricity demand [39][40][41] The purpose of this study was to develop and evaluate model(s) for forecasting medium-term electricity consumption time series. The model(s) are intended to support operational planning and trading decisions. Following the authors of [42,43], in the proposed forecasting methodology, the electricity consumption series is divided into two parts: deterministic and stochastic. Each component is estimated by parametric and nonparametric regression and time series methods. At the end, the forecasts from both components are combined to obtain the final forecast. Thus, the main contribution of this paper is the thorough investigation of the parametric and nonparametric approaches used for medium-term electricity consumption out-of-sample forecasting. Within the framework of the components estimation method, we compare models in terms of forecasting ability considering univariate, parametric, and non-parametric models. Moreover, for the considered models, the significance analysis of the difference in predication accuracy is also conducted.
The rest of the article is organized as follows. Section 2 contains an overview of Pakistan's electricity sector. Section 3 describes the proposed forecasting framework and information on the models used for forecasting. An application of the proposed forecasting framework is provided in Section 4. Section 5 concludes the study.

An Overview of Pakistan Electricity Sector
Pakistan has been facing electricity shortage crisis since its inception. In 1947, Pakistan had the capacity to produce only 60 megawatts (MW) of electricity for its thirty-two million inhabitants. To address the electricity shortage through recognized interventions, the Water and Power Development Authority (WAPDA) was established in 1958. WAPDA built two dams, each with the capacity of about 4478 MW in the late 1970s to overcome the electricity crisis. Pakistan continued facing electricity shortages even in the 1980s even though some haphazard efforts were taken towards improving the situation [44]. With each passing year, the demand for electricity continued rising because of developmental activities, i.e. urbanization, rural electrification, and industrialization [45]. In 1990s, the private sector was given licenses to build new thermal energy plants. It was a strategy shift in terms of the electricity generation mix from hydro to thermal, which increased the cost of electricity generation significantly [46]. Until 2005, the total supply of electricity was surplus to the required demand by approximately 450 MW. During 2007, Pakistan was hit by the worst power crisis in its history. Production fell by 6000 MW, resulting from huge shutdowns all over the country. In 2008, the required electricity demand fell short by 15%, and power outages became more frequent. Furthermore, the existing power stations and electricity distribution networks were also damaged during the 2005 earthquake and 2010 flood [47]. At the same time, the demand for electricity was increasing continuously. For example, from 2001 to 2008, the electricity demand rose by almost 6% per year. In June 2013, the electricity shortage reached 4250 MW per day with demand standing at 16,400 MW per day and generation at 12,150 MW per day [48]. These crisis strongly affected the economic growth and service, despite regular interventions being made to increase electricity production.
Pakistan is a developing country situated in South Asia with a population of over 200 million people. The demand for electricity is increasing exponentially due to an increased demand in both the household and manufacturing sectors. The failure of Pakistan's power policy over the last few decades has left the country with an acute electricity crisis that increased economic deficit to the country. There are some country specific issues that turn its electricity shortfall into a crisis. These include theft, misuse, and overuse of electricity in the household and industrial sectors; unjustifiably huge line losses; and low institutional capacity, corruption, mismanagement, and political controversies over mega power projects [49].
Pakistan fulfills its electricity requirements by different sources including coil, natural gas, oil, wind, solar, and nuclear [50]. The electricity sector in Pakistan comprises of WAPDA, National Electric Power Regulatory Authority (NEPRA), and a few independent power producers (IPPs). WAPDA and NEPRA are responsible for electric power maintenance, scheduling transmission, and distribution throughout in Pakistan, with the exclusion of Karachi city, which is provided by Karachi Electric Supply Company (KESC). The four main electricity producers in Pakistan includes WAPDA, KESC, IPPs, and Pakistan Atomic Energy Commission (PAEC) . The total power generation volume of Pakistan as of 30 June 2015 was 24,823,000 kW, of which thermal was 16,814,000 kW (67.74%), hydro-electric was 71,160,000 kW (28.67%), nuclear was 7,870,000 kW (3.17%), and wind was 106,000 kW (0.43%) [51]. Table 1 describe the installed electricity generating volumes of Pakistan during 2011-2015.

Proposed Forecasting Model
The main objective of this study was to forecast one month ahead electricity consumption for Pakistan. Let C m be electricity consumption for mth month. To account the dynamics of electricity consumption time series, we propose C m can be modeled as: i.e., the electricity consumption series C m is divided into two major components: D m , a deterministic component, and S m , a stochastic component. The deterministic component includes trend (log-run) and yearly periodicity. Mathematically, D m is defined as where T m represents the trend (long-term) and Y m represents the yearly periodicity component.

Parametric Case
This section describe the estimation of deterministic component using parametric regression method. The response variable C m is modeled parametrically by estimating the trend (long-run) component T m using cubic polynomial regression for time m and yearly periodicity is described by dummies as with I i,m = 1 if m refers to the ith month of the year and 0 otherwise. All regression coefficients related to these components are estimated by using Ordinary Least Square (OLS) method. Once all regression coefficients are obtained, the estimated equation is given bŷ In the past, many researchers used this method for trend and yearly cycle components estimation [52][53][54][55].

Nonparametric Case
In the literature, many authors captured trend and yearly cycle in a time series using nonparametric regression methods. For example, some authors used smoothing spline [43,56,57], kernel regression [58][59][60][61], and regression spline [43,62]. In our case, the deterministic component can be modeled nonparametrically as follows.
Here, each h i is a smoothing function of T m and Y m . For yearly cycles, the smooth function is estimated from the series 1, 2, 3, . . . , 12, 1, 2, 3, . . . , 12, . . ., whereas the long-term (trend) T m is estimated as a function of time m. For the smoothing functions, cubic regression splines are used to estimate the deterministic component. In regression spline approach, the most important selection is the number of knots and their location as they define the smoothness of the approximation. For this issue, we use cross validation (CV) technique. Regression coefficients are estimated by using OLS method and the estimated equation is given by: Once the deterministic component is estimated, the residual (stochastic) component can be obtained as: To see the performance graphically of the above-described methods used for estimation of deterministic components D m (both parametric and nonparametric), the observed electricity consumption and the estimated deterministic component are depicted in Figure 2, with parametric estimation of D m (Figure 2a) and nonparametric estimation of D m (Figure 2b). In the figure, it is evident that both models used for the estimation of D m capture adequately both dynamics, i.e. long trend and yearly seasonality, of electricity consumption series, as the increasing (upward) trend and yearly cycles can be seen clearly in the figure. Using Equation (6), the stochastic (residual) component obtained from both methods are also plotted in Figure 2. Here, it is worth mentioning that, in general, the stationarity of a time series are inspected using the Augmented Dickey-Fuller (ADF) and Philips-Perron (PP) tests [63,64]. However, several researchers showed that the ADF and PP tests may produce biased and misleading results owing to the possibility of structural breaks in the time series data [65]. Additionally, for the electricity market variables, i.e., prices or demand time series, the unit-root test results are weaker due the presence of periodicities and exceptionally heavy tailed data, which affect the size and power of standard unit-root tests [66,67]. In our case, we did not apply these tests because, once the consumption series is filtered for deterministic component, the stochastic component is always almost stationary.

Modeling the Stochastic Component
After the estimation of deterministic component using parametric and nonparametric techniques, the remaining part (residuals), considered as stochastic component, is obtained through Equation (6). The residual series obtained from both models are plotted in Figure 3. To model and forecast the stochastic component, this work consider four different univariate time series models: parametric AutoRegressive (AR), nonparametric AutoRegressive (NPAR), Smooth Transition AutoRegressive (STAR), and Autoregressive Moving Average (ARMA) models. Details about these models are given in the following.

AutoRegressive Model
An Autoregressive (AR) model is a widely used model in the time series literature. The AR models describes the response variable linearly dependent on its own past (lag) values and on a stochastic term. The general form of an AR(n) model is given by where µ indicates the intercept, α i (i = 1, 2, . . . , n) are parameters of AR(n) model, and m is a white noise process with mean zero and variance σ 2 . After plotting the ACF and PACF of the series, we concluded that lags 1, 2, and 12 are significant and, hence, are included in the model. In this work, the parameters are estimated using the Maximum Likelihood Estimation (MLE) method.

Nonparametric AutoRegressive Model
The linear AutoRegressive model can be generalized by removing the linearity property. We denote the model by Nonparametric AutoRegressive (NPAR) model. In this case, the relation between the present and past values does not have a particular parametric form and thus accounts for any potential type of nonlinearity in the data. Mathematically, NPAR is given by where h i are smoothing functions describing the relation between each past values and S m . In this work, functions h i refers to cubic regression spline functions. As done in the parametric case, we used lags 1, 2, and 12 to estimate NPAR. To overcome the curse of dimensionality, which is attributed to the exponential decline of data points within a smoothing window by increasing the dimension of regressors, generally, an additive form is considered that assumes no interactions among the explanatory variables [68].

Smooth Transition AutoRegressive (STAR) Model
The Smooth Transition AutoRegressive (STAR) model is an extension of AR model that allow smooth transition in regime switching models. To control the regime switching process, the STAR model makes use of logistic and exponential functions instead of the indicator function used in threshold AR models. Mathematically, STAR model is defined as where Z m = (1, S m−2 , S m−1 , · · · , S m−n ), R m (ω m , η, µ) is the transition function bounded between 0 and 1, and ω m is a transition variable. The parameter η represents the speed and smoothness of transition, while µ can be interpreted as threshold between two regimes. Finally, m is a white noise process that is assumed to be normally distributed with mean zero and variance σ 2 . This model is defined as a two-regime switching model, in which the transition function R allows the dynamics of the model to switch between regimes smoothly. A common specification of the generalized version of smooth transition functions is given by where σ ω m is the standard deviation of the transition variable. STAR implements the iterative building strategy described in [69] to identify and estimate STAR model.

AutoRegressive Moving Average Model
Autoregressive Moving Average (ARMA) model not only includes the past lagged values of the variable of interest but also considers the past lags of error term. In our case, the response variable S m is modeled linearly using its past values as well as past white noise terms, i.e., where µ indicates intercept; α i (i = 1, 2, . . . , n) and φ j (j = 1, 2, . . . , s) are parameters of AR and MA, respectively; and m is a Gaussian white noise series with mean zero and variance σ 2 . Inspection of the ACF and PACF suggests that, for AR part, lags 1, 2, and 12 are significant, while the first two lags for the MA part. Thus, a constrained ARMA (12,2) where α 3 =, · · · , = α 11 = 0 is fitted to S m using the MLE method. Once both components, deterministic and stochastic, are estimated, the final one month ahead forecast is obtained asĈ

Out-of-Sample Forecasting
In this study, we used monthly electricity consumption aggregated data of Pakistan. The dataset was obtained from Pakistan Bureau of Statistics (PBS). The monthly series ranges from January 1990 to December 2014 and measured in kilowatt hours (kWh). The whole dataset contains 288 data points, of which data from January 1990 to December 2009 (240 data points) were used for model estimation and from January 2010 to December 2014 (48 data points) for one month ahead out-of-sample forecasts. The monthly electricity consumption series was represented by C m , where (m = 1, 2, . . . , 288). For the forecasting accuracy, three standard accuracy measures-Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Root Mean Square Error (RMSE)-for each model were calculated as follows: where C m denotes observed series andĈ m represents forecasted consumption series for mth month. Combining models for the both components, deterministic and stochastic, led us to compare eight different possible combination, namely P-AR, P-NPAR, P-STAR, P-ARMA, NP-AR, NP-NPAR, NP-STAR, and NP-ARMA, where the first letter(s) represents the deterministic part with 'P' standing for parametric and 'NP' for nonparametric estimation, and the second part used for the stochastic model. To assess the best combinations of these models, we calculated different accuracy measures and tabulated the results in Table 2. In the table, it is clear that both the ARMA models outperform all the competitors as they produce better results. The MAPE values for P-ARMA and NP-ARMA are 4.84 and 4.83, respectively. The second best model is NP-NPAR for which the MAPE value is 5.18. The MAPE values for all combinations are also plotted in Figure 4 where the superiority of models involving ARMA model can be clearly seen.
The season-specific errors are listed in Table 3. In the table, we can observe that the season-specific MAPEs are comparatively low in autumn and high in the remaining three seasons. Except in spring, the season-specific MAPE values for P-ARMA and NP-ARMA are considerably lower than those of the other models. The season-specific MAPEs values are also plotted in Figure 5.     The ACF and PACF plots for the final error m are plotted in Figures 6 and 7. In these figures, we observe that there is no longer a meaningful autocorrelation structure present in the series. Overall residuals from all models have been whitened and can be considered as satisfactory.
To verify the superiority of the results listed in Table 2, we performed Diebold and Mariano (DM) test for each pair of models [70]. The results (p-values) of DM test are listed in Table 4. Each entry of the table is the p-value of a hypothesis system where the null hypothesis assumes no difference in the accuracy of the predictor in the column/row against the alternative hypothesis that the predictor in the column is more accurate than predictor in the row. In this table, we can see that, among all possible combination models, P-ARMA and NP-ARMA models at 5% level of significance are statically better than the rest, except when comparing them to NP-NPAR. Table 4. P-values for the Diabold and Marion test for same forecasts accuracy against the alternative hypothesis that model in the column is more accurate than model in the row (using squared loss function). Forecasted values from four best combination models are plotted in Figure 8. In this plot, we can see that the forecasted values follow the observed values of electricity consumption very well. Finally, it is worth mentioning that our best MAPE values are comparatively batter than those cited in other works. For example, using four different models for Pakistan electricity consumption forecasting, Yasmeen and Sharif [32] reported a minimum MAPE value of 5.99, a value 24% greater than our minimum MAPE value of 4.83. For the total consumption forecast of Pakistan, Hussain et al. [71] reported a RMSE value of 1796.9 that is considerably higher than our value of 460.80.

Conclusions
The main aim of this study was to forecast one month ahead electricity consumption for Pakistan using component estimation technique. To this end, the electricity consumption time series was divided into two major components, i.e., deterministic and stochastic. The deterministic component consists of trend (long-run) and yearly periodicity and was modeled by both parametric and nonparametric approaches. For the stochastic component, we used four univariate time series models, including AutoRegressive (AR), Nonparametric AutoRegressive (NPAR), Smooth Transition AutoRegressive (STAR), and AutoRegressive Moving Average (ARMA) models. The estimation of both deterministic and stochastic components led us to compare eight different combinations of these models. To check the forecasting performance of all models, consumption data from Pakistan were used, and one month ahead post-sample forecasts were obtained for four years. The predicting accuracy of the models was evaluated through MAE, MAPE, and RMSE. To evaluate the significance of the differences in the forecasting performance of the models, the Diebold and Mariano test was performed. The results show that the component based estimation approach is highly effective for modeling and forecasting electricity consumption. Among all possible models, P-ARMA and NP-ARMA produced the best results, while NP-NPAR model remained competitive to the best. Finally, our forecasting results are comparatively batter than those cited in other works. In the future, this study can be extended by exploring the effects on out-of-sample forecasting when other exogenous variables are included in the models.

Conflicts of Interest:
The authors declare no conflict of interest.