Equipping Seasonal Exponential Smoothing Models with Particle Swarm Optimization Algorithm for Electricity Consumption Forecasting

: Electricity consumption forecasting plays an important role in investment planning of electricity infrastructure, and in electricity production/generation and distribution. Accurate electricity consumption prediction over the mid/long term is of great interest to both practitioners and aca-demics. Considering that monthly electricity consumption series usually show an obvious seasonal variation due to their inherent nature subject to temperature during the year, in this paper, seasonal exponential smoothing (SES) models were employed as the modeling technique, and the particle swarm optimization (PSO) algorithm was applied to ﬁnd a set of near-optimal smoothing parameters. Quantitative and comprehensive assessments were performed with two real-world electricity consumption datasets on the basis of prediction accuracy and computational cost. The experimental results indicated that (1) whether the accuracy measure or the elapsed time was considered, the PSO performed better than grid search (GS) or genetic algorithm (GA); (2) the proposed PSO-based SES model with a non-trend component and additive seasonality term signiﬁcantly outperformed other competitors for the majority of prediction horizons, which indicates that the model could be a promising alternative for electricity consumption forecasting.


Introduction
Ensuring an adequate supply of energy is one of the national priorities for every nation in the world, because all the economic activities of a nation rely on energy in general, electricity in particular. Electricity is also an essential part of people's daily activities [1]. In addition, electricity consumption affects public/private investment planning of electricity infrastructure, and electricity production/generation and distribution [2]. Given these facts, it is crucial to have an accurate and reliable forecasting model for electricity consumption.
In recent decades, numerous approaches have been proposed, and the most commonly used methods for electricity consumption forecasting mainly consist of grey models [3,4], multiple regression models [5,6], artificial neural network (ANN) models [7,8] and support vector machine (SVM) models [9], exponential smoothing models [10,11], and auto-regressive integral moving average (ARIMA) models [12]. These models can be generally classified into three categories: artificial intelligence models, uncertainty models, and time series models. A trend has emerged in that more and more complex models using artificial intelligence techniques, such as ANN and SVM [13,14], have been proposed to discover the potentially predictive relationships between electricity consumption and input variables. However, there are many factors that affect electricity consumption, such as temperature [15], population [12], economic growth [1], and power facilities [16], and the major disadvantage of SVR and ANN

Methodologies
In this section, SES models and PSO algorithm are briefly introduced in Sections 2.1 and 2.2 respectively.

Seasonal Exponential Smoothing Models
The SES models are classified as additive or multiplicative seasonal models. In this section, we introduce the additive SES models, which can be sorted into three major categories according to trend component: non-trend, additive trend, and multiplicative trend, while detailed information about multiplicative seasonal models can be found in [33].
(i) The SES model without a trend term but with the additive seasonal term, which is abbreviated as NA for convenience of description, is presented using the following expressions.
(ii) The SES model with an additive trend term and additive seasonal term, which is abbreviated as AA for convenience of description, is presented using the following expressions.
where 0 < α, β, γ < 1 are hyperparameters; L t , T t and S t represent the level term, trend term, and seasonality term at period t, respectively;ŷ t+h|t is the forecast value for h steps ahead based on all of the data up to time t; and h + m = [(h − 1)modm] + 1 (m = 12), m is the length of a cycle.
In this paper, the initial smooth values for the level and trend components were estimated by averaging the first two full cycles of electricity consumption data, and the specific expressions The initial seasonal factors were determined using the expressions S 1 = y 1 − L 0 , . . . , S 2m = y 2m − L 0 .
In the following Section 2.2, the PSO algorithm is described, in order to enable understanding of the PSO-based SES strategy proposed in Section 3.

PSO Algorithm
In PSO, P i = (P 1 i , P 2 i , . . . , P D i ) and V i = (V 1 i , V 2 i , . . . , V D i ) denote the position vector and the velocity vector of the i-th particle, where D is number of parameter in the SES model. The best positions visited by the i-th particle and the entire particle swarm are denoted as pbest i and gbest, respectively. Specially, the velocity V d i and position P d i of the d-th dimension of the i-th particle are updated according to Formulas (4) and (5), respectively: where c 1 and c 2 are the acceleration constants; rand d 1 and rand d 2 are two real random numbers uniformly distributed between 0 and 1; w is an inertia weight, w max and w min are initial and final weight coefficients, respectively; T is the maximum number of iterations; and t is the current iteration number. Considering that a larger inertia weight is good for global exploration while a smaller inertia weight is conducive to fine-tuning in the current search area, PSO can thus make a good balance between global and local searching in the entire search process by adjusting w along with the iteration step [34].
Additionally, the velocities and positions of all the particles are constrained to the interval [V min , V max ] and [P min , P max ] by the relationships P d i = min(P max , max(−P max , P d i )) and ). The flowchart of the PSO algorithm is given in Figure 1.   Figure 1. Flowchart of the PSO algorithm.

Proposed PSO-Based SES Modeling Strategy
In this section, the proposed PSO-based SES modeling framework is formulated and corresponding steps involved are presented in detail, and the specific flowchart is shown in Figure 2.

Proposed PSO-Based SES Modeling Strategy
In this section, the proposed PSO-based SES modeling framework is formulated and corresponding steps involved are presented in detail, and the specific flowchart is shown in Figure 2. Initialize the history optimal position of each particle, the global optimal position among the swarm 3. Calculate the initial fitness value of the particle which is root mean squared error. As shown in Figure 2, the detailed PSO-based SES modeling strategy includes four major operations: initialization, evaluation, update, and prediction. The overall learning process in Figure 2 is elaborated step by step below.

Initialization
In PSO, the initial positions of all the particles are randomly distributed across a designed D-dimensional space and each dimension corresponds to a specified smoothing parameter. The initial velocity of each particle in the swarm is randomly assigned a real value which is equal to the product of a real random number uniformly distributed between 0 and 1 and the maximum velocity max V ; here, max V was usually set at 10-20% of the dynamic range of the variable in each dimension [35].

Evaluation
In order to elaborate the computational process of the initial fitness value of each particle, we take the i-th particle and h steps ahead forecast as an example, and assume that the data on the training set range from + Firstly, the position of the ith particle is transformed to the corresponding smoothing parameter of the SES models.
Secondly, the corresponding , ,  As shown in Figure 2, the detailed PSO-based SES modeling strategy includes four major operations: initialization, evaluation, update, and prediction. The overall learning process in Figure 2 is elaborated step by step below.

Initialization
In PSO, the initial positions of all the particles are randomly distributed across a designed D-dimensional space and each dimension corresponds to a specified smoothing parameter. The initial velocity of each particle in the swarm is randomly assigned a real value which is equal to the product of a real random number uniformly distributed between 0 and 1 and the maximum velocity V max ; here, V max was usually set at 10-20% of the dynamic range of the variable in each dimension [35].
As mentioned in Section 2.1, the estimation of the trend component at initial time t 0 involves the first two full cycles of electricity consumption data; we let t 0 = 2m, then all of the data up to time t 0 were used to calculate the initial level component L(t 0 ), the initial trend component T(t 0 ), and the initial season factors S(1), S(2), . . . , S(t 0 ).

Evaluation
In order to elaborate the computational process of the initial fitness value of each particle, we take the i-th particle and h steps ahead forecast as an example, and assume that the data on the training set range from t 0 + 1 to t 0 + l 1 . Firstly, the position of the i-th particle is transformed to the corresponding smoothing parameter of the SES models. Secondly, the corresponding L j , T j , S j and the prediction valueŷ j at each period j ∈ [t 0 + h, t 0 + l 1 ] are calculated using the equations of the SES models and the corresponding forecasting formula, respectively. Thirdly, the fitness value RMSE is evaluated as defined in Equation (8). Finally, the current positions of all the particles are used to initialize their pbest, and the best position visited by the entire swarm is used to initialize the gbest.

Update
The position and velocity of each particle are updated using Equations (4) and (5), respectively, and their fitness values are recalculated as per Step 2. Once the fitness values of all the particles are obtained, pbest and gbest are updated. A judgement is made as to whether the termination condition is satisfied and, if so, the best particle with the near-optimal smoothing parameters is returned; otherwise, the previous step is repeated.

Prediction
The forecasting values on the test set are evaluated with the near-optimal smoothing parameters. Assuming that the data in the test set range from t 0 + l 1 + 1 to t 0 + l 1 + l 2 , then the length of the test set is l 2 . Consistent with the prediction operation of the training set, the test set still takes h periods ahead to forecast. The corresponding L j , T j , S j and the prediction valueŷ j at each period j ∈ [t 0 + l 1 + h, t 0 + l 1 + l 2 ] are then calculated using the equations of the SES models and the corresponding forecasting formula, respectively.

Data Description
To examine the performances of the proposed PSO-based SES methods in terms of their forecast accuracy, two real-world datasets, i.e., monthly electricity consumption data from United States and China, were used in the present study. The period of the first dataset ranged from January 2002 to September 2019 and that of the second dataset was from January 2010 to December 2015.
As there are 12 months in a full period, the estimation of the trend component at initial time t 0 involved the first two full cycles of historical electricity consumption data. Therefore, the first dataset was divided into the calculation set (the first two years) used to evaluate the initialized relevant parameters in the SES models, the training set (January 2004 to December 2015), and the testing set (the last 45 months). Analogously, the second dataset was divided into the calculation set (January 2010 to December 2011), the training set (January 2012 to December 2014), and the testing set (the last 12 months in 2015). Here, the testing sets were completely independent of the training sets and were not involved in the learning procedure. Figure 3 depicts the two datasets.

Experimental Design
There were two goals in the experimental study. The first goal was to verify the benefits of the PSO algorithm for parameter optimization in SES models. To achieve the goal, the Holt-Winters additive model, as a popular seasonal exponential smoothing model,

Experimental Design
There were two goals in the experimental study. The first goal was to verify the benefits of the PSO algorithm for parameter optimization in SES models. To achieve the goal, the Holt-Winters additive model, as a popular seasonal exponential smoothing model, was employed to forecast the electricity consumption in the above two datasets, and the commonly used GS and GA were also used to optimize the parameters of the Holt-Winters additive model for comparison with PSO. The second goal was to examine the performance of the proposed PSO-based SES models and to determine which of the three different additive SES models was the most effective in terms of accuracy. In this regard, seasonal ARIMA (SARIMA) from time series forecasting models as well as SVR and a back propagation neural network model (BPNN) based on machine learning techniques were used for comparative analysis with the PSO-based SES models.

Performance Measure
To assess the forecasting performance of a model, three alternative forecasting accuracy measures were employed in this study, namely the mean absolute percentage error (MAPE) [19], the root mean square error (RMSE) [36], and the normalized root mean square error (NRMSE) [37]. Definitions of these measures are presented in the following expressions (7)-(9).
(y t+j −ŷ t+j ) 2 /y (9) where N is the sample number in the test set, y t+j is the actual value at period t + j, and y t+j is the prediction value at period t + j. The smaller the three measure values are, the better the prediction performance of the model.

Study 1: Examination of the PSO Algorithm for Parameter Optimization in SES Models
To confirm the benefit of the PSO algorithm for parameter selection in SES models, in this paper, the GS method from the classical domain and the genetic algorithm (GA) from the class of evolutionary computation were used for comparative analysis, and the quantitative and comprehensive assessments were performed with two real-world electricity consumption datasets on the basis of the prediction accuracy and computational cost for GS, GA, and PSO. The experiments were implemented in MATLAB 2016a using in-house software executed on a computer with Intel Core i5-3230M, 2.60 GHz CPU, 4 GB RAM. In general, the experimental facility did not require high computer configuration; was easy to operate, simple, and efficient; and produced repeatable experiments.
In these experiments, the MATLAB codes for GS and PSO were written in source code, while the implementation of GA used the GA toolbox that comes with MATLAB 2016a. It should be noted that the selection of parameters in the PSO algorithm (i.e., inertial weight, coefficients, swarm size, and number of iterations) is yet another challenging model selection task. The final parameters, chosen according to relative empirical and theoretical studies [32,38] and a trial-and-error approach based on the prediction performance and computational time, are summarized in Table 1. Aimed at the random of PSO algorithm, the modeling process for each prediction horizon was repeated 10 times and performance of the PSO-based additive Holt-Winters model was judged by the mean of each accuracy measure.  Table 2 shows the prediction performance of the additive Holt-Winters model with GS, PSO, and GA in terms of three accuracy measures (i.e., MAPE, RMSE, NRMSE) for the U.S. and China datasets. The columns labeled "Average 1-h" show the accuracy measures over the prediction horizon 1 to h. The last column shows the average ranking for each method over all forecast horizons of the out-of-sample prediction performance. As per the results presented in Table 2, it was not difficult to make the following observation: when considering the U.S. dataset, GA performed the worst across three measures and the computation time was the longest. GS(0.05) and PSO had the same ranking in the terms of RMSE, and the average RMSE measures of GS(0.05) and PSO over all forecast horizons were 8.333 and 8.339, respectively. The ranking with respect to MAPE was GS(0.05) then PSO, while the ranking with respect to NRMSE was PSO then GS(0.05). Additionally, the elapsed time of GS(0.05) for a single replicate was more than twice that of PSO. When considering the China dataset, GS(0.05) performed the worst and GA performed the second worst across three measures according to the average rank. The ranking with respect to MAPE was GS(0.01) then PSO, while the ranking with respect to RMSE and NRMSE was PSO then GS(0.01). In terms of the elapsed time, the GA was computationally much more expensive than the other methods. The GS(0.05) was tens of times faster than GS(0.01), and PSO is hundreds of times faster than GS(0.01).
In conclusion, the PSO was significantly superior to GA, GS(0.01), and GS(0.05) in terms of the elapsed time. PSO outperformed GA across three measures for the majority of prediction horizons in the U.S. and China datasets. PSO was consistently better than GS(0.05) across three measures at all forecast horizons for the China dataset. Moreover, there was no significant difference between PSO and GS(0.01) for the China dataset. Thus, it is conceivable that PSO algorithm is suitable for parameter optimization in SES models.  Table 3. Here, "NA" denotes the SES model without a trend term but with an additive seasonal term, "AA" denotes the SES model with an additive trend term and additive seasonal term, and "MA" denotes the SES model with a multiplicative trend term and an additive seasonal term, and PSO was used to optimize the smoothing parameters of these SES models. model without a trend term but with an additive seasonal term, "AA" denotes the SES model with an additive trend term and additive seasonal term, and "MA" denotes the SES model with a multiplicative trend term and an additive seasonal term, and PSO was used to optimize the smoothing parameters of these SES models.

Study 2: Comparison with Well-Known Forecasting Models
The experiments of the SARIMA model were implemented in Eviews 8. , respectively, and all the estimated parameters in these two models passed the significance test;  Residual error test: the series correlation LM test was performed on the fitted residual sequences of the above two models. The results showed that the two residual sequences were not correlated, which indicates that the models we established were credible;  Prediction: Eviews 8 provided a dynamic forecast function for SARIMA's multi-stepahead prediction. Here, we obtained the predicted values of each sample in the test set from one step ahead to twelve steps ahead by changing the range of prediction sample, and then rearranged these prediction values according to the forecast horizon and further calculated the prediction accuracy of SARIMA at each forecast horizon.
The experiments of SVR and BPNN models were implemented using LibSVM toolbox (version 2.86) and BPNN toolbox in MATLAB 2016a. The parameters of SVR took default values and the embedding dimensions of SVR and BPNN were determined to be 6 and 12 by trial and error.       The experiments of the SARIMA model were implemented in Eviews 8. The specific process was as follows: • Stationarity test: a seasonal difference and a nonseasonal difference were performed for the original series. The augmented Dickey-Fuller test result showed that the difference series was stationary, so d = 1 and D = 1; • SARI MA(p, 1, q)(P, 1, Q) 12 model identification: the correlation coefficient graph of the difference sequence was analyzed and then the value range of p,q,P, and Q was determined; • SARI MA(p, 1, q)(P, 1, Q) 12 model selection: the trial-and-error method based on BIC was used to determine the order of the model. In this paper, the models established for the two electricity consumption datasets in U.S. and China were SARI MA(1, 1, 1)(0, 1, 1) 12 and SARI MA(0, 1, 1)(0, 1, 1) 12 , respectively, and all the estimated parameters in these two models passed the significance test; • Residual error test: the series correlation LM test was performed on the fitted residual sequences of the above two models. The results showed that the two residual sequences were not correlated, which indicates that the models we established were credible; • Prediction: Eviews 8 provided a dynamic forecast function for SARIMA's multi-stepahead prediction. Here, we obtained the predicted values of each sample in the test set from one step ahead to twelve steps ahead by changing the range of prediction sample, and then rearranged these prediction values according to the forecast horizon and further calculated the prediction accuracy of SARIMA at each forecast horizon.
The experiments of SVR and BPNN models were implemented using LibSVM toolbox (version 2.86) and BPNN toolbox in MATLAB 2016a. The parameters of SVR took default values and the embedding dimensions of SVR and BPNN were determined to be 6 and 12 by trial and error.
According to the results presented in Figure 4 and Table 3, we can make several observations about the U.S. dataset: • Among all the examined models, the two models with the worst prediction results were MA-PSO and BPNN, which is obvious in Figure 4b.

•
According to the average rank in terms of MAPE measure, the top three models turned out to be NA-PSO, then AA-PSO, and then SARIMA. The rankings with respect to NRMSE measure were SARIMA, then NA-PSO, and then AA-PSO. However, the average NRMSE measure of NA-PSO over all forecast horizons was less than that of SARIMA, which indicates that NA-PSO outperformed SARIMA to a certain extent.

•
The proposed NA-PSO and AA-PSO consistently achieved more accurate forecasts than the MA-PSO regardless of the accuracy measures and forecast horizon considered. A possible reason is that multiplication form was not suitable for fitting the trend term of current electricity consumption data. • As far as the comparison between the NA-PSO, AA-PSO, and SARIMA, we can see that the results were mixed among the horizons examined; there was an interesting phenomenon observed that whatever the accuracy measures considered, SARIMA won for one-step-ahead and two-step-ahead predictions. The main reason could be that the principle of SARIMA for multi-step-ahead prediction is recursive forecasting, which will cause errors to accumulate as the forecast horizon increases.
From Figure 5 and Table 3, several observations about the China dataset can be made.
• Among all the examined models, the two models with the worst prediction results were SVR and BPNN, which is obvious in Figure 5a,b. The possible reason is that the methods based on machine learning were suitable for sufficient training samples, while there were only 72 samples in China dataset.

•
According to the MAPE, the top three models turned out to be SARIMA, then NA-PSO, and then AA-PSO. The rankings with respect to the NRMSE measure were NA-PSO, and then AA-PSO and SARIMA tied for second. It should be noted that the reason why MA-PSO is not presented in Table 3 and Figure 5 for the China dataset is the MA-PSO performed the worst regardless of the accuracy measures and forecast horizon considered.

•
As far as the comparison between the NA-PSO, AA-PSO, and SARIMA, we saw that the results were mixed among the horizons examined, but SARIMA won for one-step-ahead prediction regardless of the accuracy measures considered, which indicates that SARIMA is suitable for short-term forecasting. • Overall, it is clear that the proposed NA-PSO held one of the top positions. The main reason could be that the electricity consumption associated with people's daily activities and various economic activities has basically reached a saturated level, which is more intuitively reflected in Figure 3. It is not difficult to see that growth trends of electricity consumption in the U.S. and China were no longer obvious after 2005 and 2012, respectively.
In order to determine whether there existed a statistically significant difference among the five models in the hold-out sample, ANOVA procedures are performed for each performance measure, prediction horizon, and dataset. All ANOVA results were significant at the 0.05 level, which suggests that there were significant differences among the five models. The results are omitted here to save space. Tukey's HSD test was used to further identify the significant difference between any two models. The results of these multiple comparison tests for the U.S. and China datasets are shown in Table 4. For each accuracy measure, prediction horizon, and dataset, we have ranked the models in order from 1 (the best) to 5 (the worst). AA-PSO <* SARIMA <* NA-PSO <* SVR <* BPNN Note: * denotes that the mean difference between the two adjacent methods is significant at the 0.05 level.