A Hybrid BA-ELM Model Based on Factor Analysis and Similar-Day Approach for Short-Term Load Forecasting

Accurate power-load forecasting for the safe and stable operation of a power system is of great significance. However, the random non-stationary electric-load time series which is affected by many factors hinders the improvement of prediction accuracy. In light of this, this paper innovatively combines factor analysis and similar-day thinking into a prediction model for short-term load forecasting. After factor analysis, the latent factors that affect load essentially are extracted from an original 22 influence factors. Then, considering the contribution rate of history load data, partial auto correlation function (PACF) is employed to further analyse the impact effect. In addition, ant colony clustering (ACC) is adopted to excavate the similar days that have common factors with the forecast day. Finally, an extreme learning machine (ELM), whose input weights and bias threshold are optimized by a bat algorithm (BA), hereafter referred as BA-ELM, is established to predict the electric load. A simulation experience using data deriving from Yangquan City shows its effectiveness and applicability, and the result demonstrates that the hybrid model can meet the needs of short-term electric load prediction.


Introduction
Short-term load forecasting is an important component of smart grids, which not only can achieve the goal of saving cost but also ensure a continuous flow of electricity supply [1].Moreover, against the background of energy-saving and emission-reduction, accurate short-term load prediction plays an important role in avoiding a waste of resources in the process of power dispatch.Nevertheless, it should be noted that the inherent irregularity and linear independence of the loading data present a negative effect on the exact power load prediction.
Since the 1950s, short-term load forecasting has been attracting considerable attention from scholars.Generally speaking, the methods for load forecasting can be classified into two categories: traditional mathematical statistical methods and approaches which are based on artificial intelligence.The conventional methods like regression analysis [2,3] and time series [4] are mainly based on mathematical statistic models such as the vector auto-regression model (VAR) and auto-regressive moving average model (ARMA).With the development of science and technology, the shortcomings of statistical models, such as the effect of regression analysis based on historical data that will be weakened with the extension of time or the results of time-series prediction that are not ideal when the stochastic factors are large, are beginning to appear and are criticized by researchers for their low non-linear fitting capability.Owing to the characteristic of strong self-learning, self-adapting ability and non-linearity, artificial intelligence methods such as back propagation neural networks (BPNN), support vector machine (SVM) as well as the least squares support vector machine (LSSVM) etc. have obtained greater attention and have had a wide application in the field of power load forecasting during the last decades [5,6].Park [7] and his partners first used the artificial neural network in electricity forecasting.The experimental results demonstrated the higher fitting accuracy of the artificial neural network (ANNs) compared with the fundamental methods.Hernandez et al. [8] successfully presented a short-term electric load forecast architectural model based on ANNs and the results highlighted the simplicity of the proposed model.Yu and Xu [9] proposed a combinational approach for short-term gas-load forecasting including the improved BPNN and the real-coded genetic algorithm which is employed for the parameter optimization of the prediction model, and the simulation illustrated its superiority through the comparisons of several different combinational algorithms.Hu et al. [10] put forward a generalized regression neural network (GRNN) optimized by the decreasing step size fruit fly optimization algorithm to predict the short-term power load, and the proposed model showed a better performance with a stronger fitting ability and higher accuracy in comparison with traditional BPNN.
Yet, the inherent feature of BPNN may cause low efficiency and local optimal.Furthermore, the selection of the number of BPNN hidden nodes depends on trial and error.As a consequence, it is difficult to obtain the optimal network.On the basis of structural risk, empirical risk and vapnik-chervonenkis (VC) dimension bound minimization principle, the support vector machine (SVM) showed a smaller practical risk and presented a better performance in general [11].Zhao and Wang [12] successfully conducted a SVM for short-term load forecasting, and the results demonstrated the excellence of the forecasting accuracy as well as computing speed.Considering the difficulty of the parameter determination that appeared in SVM, the least squares support vector machine (LSSVM) was put forward as an extension, which can transform the second optimal inequality constraints problem in original space into an equality constraints' linear system in feature space through non-linear mapping and further improve the speed and accuracy of the prediction [13].Nevertheless, how to set the kernel parameter and penalty factor of LSSVM scientifically is still a problem to be solved.
Huang et al. [14] proposed a new single-hidden layer feed forward neural network and named it as the extreme learning machine (ELM) in 2009, in which one can randomly choose hidden nodes and then analytically determine the output weights of single-hidden layer feed-forward neural network (SLFNs).The extreme learning machine tends to have better scalability and achieve similar (for regression and binary class cases) or much better (for multi-class cases) generalization performance at much faster learning speed (up to thousands of times) than the traditional SVM and LSSVM [15].However, it is worth noting that the input weights matrix and hidden layer bias assigned randomly may affect the generalization ability of the ELM.Consequently, employing an optimization algorithm so as to obtain the best parameters of both the weight of input layer and the bias of the hidden layer is vital and necessary.The bat algorithm (BA), acknowledged as a new meta-heuristic method, can control the mutual conversion between local search and global search dynamically and performs better convergence [16].Because of the excellent performance of local search and global search in comparison with existing algorithms like the genetic algorithm (GA) and particle swarm optimization algorithm (PSO), researchers and scholars have applied BA in diverse optimization problems extensively [17][18][19].Thus, this paper adopted the bat algorithm to obtain the input weight matrix and the hidden layer bias matrix of ELM corresponding to the minimum training error, which can not only maximize the merit of BA's global and local search capability and ELM's fast learning speed, but also overcome the inherent instability of ELM.
The importance of forecasting methods is self-evident, yet the analysis and processing of the original load data also cannot be ignored.Some predecessors have supposed historical load and weather as the most influential factors in their research [20][21][22].However, selecting the historical load data scientifically or not can cause a strong impact on the accuracy of prediction.In addition, there are still many other external weather factors that may also potentially influence the power load.Only considering the temperature as the input variable may be not enough [23][24][25], and other meteorological factors such as humidity, visibility and air pressure etc. also should be taken into consideration.Besides, it is necessary to analyze and pretreat the influence factors on the premise of considering the influence factors synthetically so as to achieve the goal of improving the generalization ability and the precision of the prediction model.Therefore, this paper applied factor analysis (FA) and the similar-day approach (SDA) for input data pre-processing, where the former is utilized to extract the latent factors that essentially affect the load and the SDA is adopted to excavate the similar days that have common factors with the forecast day.
To sum up, the load forecasting process of the ELM optimized by the bat algorithm can be elaborated in four steps.Firstly, based on 22 original influence factors, factor analysis is adopted to extract the latent factors which essentially affect load.To further explore the relationship between historical load and current load, a partial auto correlation function (PCAF) is applied to demonstrate the significance of previous data.Then, in accordance with the latent factors and the loads of each day, ant colony clustering is used to divide the load to different clusters.
The rest of the paper is organized as follows: Section 2 gives a brief description about the material and methods, including bat algorithm (BA), extreme learning machine (ELM), ant colony clustering algorithm (ACC) as well as the framework of the whole model.Data analysis and processing are considered in Sections 3 and 4 which present an empirical analysis of the power load forecasting.Finally, conclusions are drawn in Section 5.

Bat Algorithm
Based on the echolocation of micro-bats, Yang [26] proposed a new meta-heuristic method and called it the bat algorithm, one that combines the advantages both the genetic algorithm and particle swarm optimization with the superiority of parallelism, quick convergence, distribution and less parameter adjustment.In the d dimensions of search space during the global search, the bat i has the position of x t i , and velocity v t i at the time of t, whose position and velocity will be updated as Equations ( 1) and (2), respectively: where x ˆis the current global optimal solution; and F i is the sonic wave frequency which can be seen in Equation (3): where β is a random number within [0, 1]; F max and F min are the max and min sonic wave frequency of the bat I.In the process of flying, each initial bat is assigned one random frequency in line with [F min , F max ].
In local search, once a solution is selected in the current global optimal solution, each bat would produce a new alternative solution in the mode of random walk according to Equation (4): where x 0 is a solution that is chosen in current optimal disaggregation randomly; A t is the average volume of the current bat population; and µ is a D dimensional vector within in [−1, 1].The balance of bats is controlled by the impulse volume A(i) and impulse emission rate R(i).Once the bat locks the prey, the volume A(i) will be reduced and the emission rate R(i) will be increased at the same time.The update of A(i) and R(i) are expressed as Equations ( 5) and (6), respectively: where γ and θ are both constants that γ is within [0, 1] and θ > 0. This paper set the two parameters as γ = θ = 0.9.The basic steps of the standard bat algorithm can be summarized as the pseudo code seen in the following: Bat algorithm.Rank the bats and find the current best x*.16: End

Extreme Learning Machine
After setting the input weights and hidden layer biases randomly, the output weights of the ELM can be analytically determined by solving a linear system in accordance with the thinking of the Moore-Penrose (MP) generalized inverse.The only two parameters needed to be assigned allow the extreme learning machine to generate the input weights matrix and hidden layer biases automatically at fast running speed.Consequently, the extreme learning machine expresses the advantages of a fast learning speed, small training error and strong generalization ability compared with the traditional neural networks in solving non-linearity problems [27].The concrete framework of ELM is shown in Figure 1 and the computational steps of the standard ELM can be illustrated as follows: where γ and θ are both constants that γ is within [0, 1] and θ > 0. This paper set the two parameters as γ = θ = 0.9.The basic steps of the standard bat algorithm can be summarized as the pseudo code seen in the following: Bat algorithm.1: Initialize the location of bat populations xi (i = 1, 2, 3, …, n) and velocity vi 2: Initialize frequency Fi pulse emission rate Ri and loudness Ai 3: While (t < the maximum number of iterations) 4: Generate new solutions by adjusting the frequency 5: Generate new velocity and location 6: If (rand >Ri) 7: Select a solution among best solutions 8: Generate new local solution around the selected best solution 9: End if 10: Get a new solution through flying randomly 11: Accept the new solution 13: Increase ri and decrease Ai 14: End if 15: Rank the bats and find the current best x*.16: End

Extreme Learning Machine
After setting the input weights and hidden layer biases randomly, the output weights of the ELM can be analytically determined by solving a linear system in accordance with the thinking of the Moore-Penrose (MP) generalized inverse.The only two parameters needed to be assigned allow the extreme learning machine to generate the input weights matrix and hidden layer biases automatically at fast running speed.Consequently, the extreme learning machine expresses the advantages of a fast learning speed, small training error and strong generalization ability compared with the traditional neural networks in solving non-linearity problems [27].The concrete framework of ELM is shown in Figure 1 and the computational steps of the standard ELM can be illustrated as follows: The connection weights both between input layer and hidden layer and between hidden layer and output layer as well as the hidden layer neuron threshold are shown in the following: The connection weights both between input layer and hidden layer and between hidden layer and output layer as well as the hidden layer neuron threshold are shown in the following: Energies 2018, 11, 1282 where ω is the connection weights between input layer and hidden layer; n is the input layer neuron number, and L is the hidden layer neuron number, and, where β is the connection weights between hidden layer and output layer and m is the output layer neuron number, and, Y = y i1 , y i2 , • • • , y iQ m×Q (10) where X is the input vector and Y is the corresponding output vector, and, where H is the hidden layer output matrix, b is the bias which is generated randomly in the process of network initialization, and g(x) is the activation function of the ELM.

Ant Colony Clustering Algorithm
When processing the large number of samples, the traditional clustering learning algorithm often has the disadvantages of slow clustering speed, falling easily into local optimal, and it is difficult to obtain the optimal clustering result.At the same time, the clustering algorithm involves the selection of the number of clustering K, which directly affects the clustering result.Using ant colony clustering to pre-process the load samples can reduce the number of input samples on the premise of including all sample features, and also can effectively simplify the network structure and reduce the calculation effort.The flowchart of the ant colony clustering algorithm is shown in Figure 2. where ω is the connection weights between input layer and hidden layer; n is the input layer neuron number, and L is the hidden layer neuron number, and, where β is the connection weights between hidden layer and output layer and m is the output layer neuron number, and, Y = [y i1 , y i2 , ⋯ , y iQ ] m×Q (10) where X is the input vector and Y is the corresponding output vector, and, where H is the hidden layer output matrix, b is the bias which is generated randomly in the process of network initialization, and g(x) is the activation function of the ELM.

Ant Colony Clustering Algorithm
When processing the large number of samples, the traditional clustering learning algorithm often has the disadvantages of slow clustering speed, falling easily into local optimal, and it is difficult to obtain the optimal clustering result.At the same time, the clustering algorithm involves the selection of the number of clustering K, which directly affects the clustering result.Using ant colony clustering to pre-process the load samples can reduce the number of input samples on the premise of including all sample features, and also can effectively simplify the network structure and reduce the calculation effort.The flowchart of the ant colony clustering algorithm is shown in Figure 2.

Introduction of Factor Analysis-Ant Colony Clustering-Bat Algorithm-Extreme Learning Machine (FA-ACC-BA-ELM) Model
Since the ELM has less ability to respond to samples of the training set, its generalization ability is insufficient.So we propose BA-ELM.In this paper, the flowchart of the factor analysis-similar

Introduction of Factor Analysis-Ant Colony Clustering-Bat Algorithm-Extreme Learning Machine (FA-ACC-BA-ELM) Model
Since the ELM has less ability to respond to samples of the training set, its generalization ability is insufficient.So we propose BA-ELM.In this paper, the flowchart of the factor analysis-similar day-bat algorithm-extreme learning machine (FA-SD-BA-ELM) model is shown in Figure 3.As discussed in part 1, auto correlation and the partial correlation function (PACF) are executed to analyze the inner relationships between the history loads.Based on the influencing factors of load, factor analysis (FA) is used for extracting input variables.According to the result of factors analysis and previous load, the ant colony clustering algorithm (ACC) is used to find historical days that have common factors similar to the forecast day.Part 2 is the bat optimization algorithm (BA) and part 3 is the forecasting of the extreme learning machine (ELM).

Selection of Influenced Indexes
Considering that the human activities are always disturbed by many external factors and then the power load is affected, some effective features are selected as factors.In this paper, the selection of factors is mainly based on four aspects: (1) The historical load.Generally speaking, the historical load impacts on the current load in short-term load forecasting.In this paper, the daily maximum load, daily minimum load, average daily load, peak average load of previous day, valley average load of previous day, average load of the day before, average load of 2 days before, average load of 3 days before, average load of 4 days before, average load of 5 days before and average load of 6 days before are taken into consideration.(2) The temperature.As people use temperature-adjusting devices to adapt to the temperature, in a previous study [23][24][25], temperature was considered as an essential input feature and the forecasting results were accurate enough.In this paper, the maximum temperature, the minimum temperature and the average temperature are selected as factors.
(3) The weather condition.We mainly take into account the seasonal patterns, humidity, visibility, weather patterns, air pressure and wind speed.The four seasons are represented as 1, 2, 3 and 4 respectively.For different weather patterns, we set different weights: {sunny, cloudy, overcast, rainy} = {0, 1, 2, 3}.(4) The day type.In this aspect, the type of day and date are taken into consideration.The type of date means the days are divided into workdays (Monday-Friday), weekend (Saturday-Sunday), and holidays.The weights of three types of date are 0, 1 and 2 respectively.For the date, we set different weight: {Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday} = {1, 2, 3, 4, 5, 6, 7}.

Factor Analysis
Originally proposed by British psychologist C.E. Spearman, factor analysis is the study of statistical techniques for extracting highly interrelated variables into one group, and each type of group becomes a factor that reflects most of the original information with fewer factors.Not only does factor analysis reduce indicators' dimensions and improve the generalization of the model but also the common factors it elicited to portray and replace primitive variables can commendably mirror and explain the complicated relationship between variables, keeping data messages with essentially no less information.In this paper, factor analysis is used to extract factors that can reflect the most information of the original 22 influencing variables, whose result is shown in Table 2.
First of all, Table 1 gives the result of Kaiser-Meyer-Olkin (KMO) and the Barlett test of sphericity that can serve as a criteria to judge whether the data is suitable for the factor analysis.The statistic value more than 0.7 can illustrate the compatibility and the 0.74 obtained from the power load data confirms the correctness of factor analysis.
Table 2 shows six factors that are extracted from 22 original variables.The accumulative contribution rate at 84.434%, more than 80%, reflects that the new six factors can deliver the most information of the original indicators.It can be seen from Table 2 that factor 1 that mainly represents the history load accounts for the largest proportion at 35.128%.In addition, considering that the variables in factor 1 may not be sufficient on behalf of the historical load, the paper carried out a further analysis of the previous data by means of the correlation analysis which can be seen in part 3.2.Factor 2 which mainly represents meteorology element accounts for 19.646%, and the remaining four factors are 10.514%, 7.746%, 6.087%, and 5.313%, respectively.

The Analysis of Correlation
Additionally, this paper conducted a further analysis of the correlation between the amount of historical load and the target load from two different viewpoints so as to eliminate the internal correlation.On the one hand, the partial auto correlation function (PACF) was carried out throughout the overall power load to dig out the correlation between the target load and the previous load.On the other hand, the whole load data with the same time interval were also implemented by PACF individually to seek the relationship among the load with the same time.The results of partial auto correlation can be seen in Figures 4 and 5, respectively.
For instance, under the confidence level of 90%, it can be seen from Figure 4 that the lags of the first 2 h are significant to the current data.That is to say, the loads of the first two hours are influential to the current load.As for Figure 5, it is known that only the first lag 1 is prominent to the current load data except the load of 00:00 (Lag 2).Consequently, it can be concluded that the four factors including the first two hours before 00:00 and the same time power load that occurred yesterday and the day before yesterday were selected as the input factors at the time of 00:00.

Clustering with Ant Colony Algorithm
Selecting the exogenous features as input directly may lead the prediction model to a slow convergence and to poor prediction accuracy.Thus, the paper employs the similar day load which is clustered by the ant colony clustering algorithm for the prediction so as to improve the forecasting accuracy.According to the load every day and the six factors extracted from 22 variables, the 60 days from 1 May 2013 to 30 June 2013 are named with numbers from 1 to 60 and are divided into four clusters by the ant colony algorithm.The parameters of the ACC algorithm can be seen in Table 3, and the clustering result is expressed in Table 4.As a consequence, it can be known that the three test days whose numbers are 58, 59, and 60 belong to class 4, class 1, and class 3, respectively.
4. The partial auto correlation result of the overall power load.

Clustering with Ant Colony Algorithm
Selecting the exogenous features as input directly may lead the prediction model to a slow convergence and to poor prediction accuracy.Thus, the paper employs the similar day load which is clustered by the ant colony clustering algorithm for the prediction so as to improve the forecasting accuracy.According to the load every day and the six factors extracted from 22 variables, the 60 days from 1 May 2013 to 30 June 2013 are named with numbers from 1 to 60 and are divided into four clusters by the ant colony algorithm.The parameters of the ACC algorithm can be seen in Table 3, and the clustering result is expressed in Table 4.As a consequence, it can be known that the three test days whose numbers are 58, 59, and 60 belong to class 4, class 1, and class 3, respectively.

Parameter m Alpha Beta Rho N NC_max
Value 30 0.5 0.5 0.1 4 100 Figure 5.The partial auto correlation result of the load with the same interval.

Clustering with Ant Colony Algorithm
Selecting the exogenous features as input directly may lead the prediction model to a slow convergence and to poor prediction accuracy.Thus, the paper employs the similar day load which is clustered by the ant colony clustering algorithm for the prediction so as to improve the forecasting accuracy.According to the load every day and the six factors extracted from 22 variables, the 60 days from 1 May 2013 to 30 June 2013 are named with numbers from 1 to 60 and are divided into four clusters by the ant colony algorithm.The parameters of the ACC algorithm can be seen in Table 3, and the clustering result is expressed in Table 4.As a consequence, it can be known that the three test days whose numbers are 58, 59, and 60 belong to class 4, class 1, and class 3, respectively.

Application of BA-ELM
To verify the rationality of data processing, the BA-ELM model was conducted on Yangquan City load forecasting.In this paper, the relative error (RE), mean absolute percentage error (MAPE), mean absolute error (MAE) and root-mean-square error (RMSE) are employed to validate the performance of the model.The formulas definition are expressed as follows, respectively: where n stands for the quantity of the test sample, y i is the real load, while y i is the corresponding predicted output.Moreover, the paper compared the ELM with the benchmark model's LSSVM and the BPNN to demonstrate the superiority of the proposed model.The parameters of the models are shown in Table 5. Figure 6 shows the iterations process of BA.From the figure we can see that BA achieves convergence at 350 times.The optimal values of the parameters are shown in Table 6.6.The optimal parameters.

Case Study
In order to testify the feasibility of the proposed model, the 24-h power load data of Yangquan City are selected for two months.It can be seen that there is nearly no apparent regularity to be obtained from the actual load curves showed in Figure 7 which represents the four classes of load curve.As mentioned above, the three testing days belong to classes 4, 1, 3 respectively and the prediction model is built for the power load forecasting at the same time.10, the deviation can be captured between the actual value and the forecasting results.It can be seen that the forecasting results' curve of the BA-ELM method are close to the actual data in all testing days, which indicates its higher fitting accuracy.

Case Study
In order to testify the feasibility of the proposed model, the 24-h power load data of Yangquan City are selected for two months.It can be seen that there is nearly no apparent regularity to be obtained from the actual load curves showed in Figure 7 which represents the four classes of load curve.As mentioned above, the three testing days belong to classes 4, 1, 3 respectively and the prediction model is built for the power load forecasting at the same time.

Case Study
In order to testify the feasibility of the proposed model, the 24-h power load data of Yangquan City are selected for two months.It can be seen that there is nearly no apparent regularity to be obtained from the actual load curves showed in Figure 7 which represents the four classes of load curve.As mentioned above, the three testing days belong to classes 4, 1, 3 respectively and the prediction model is built for the power load forecasting at the same time.The program runs in MATLAB R2015b under the WIN7 system.The short-term electric load forecasting results of three days of the BA-ELM, ELM, BP and LSSVM models are shown in Tables 7-9, respectively.For the purpose of explaining the results more clearly, the forecasting values curve of the proposed model and comparisons are shown in Figures 8-10.In addition, Figures 11-13 reflect the comparisons of relative errors between the proposed model and the others.According to Figures 8-10, the deviation can be captured between the actual value and the forecasting results.It can be seen that the forecasting results' curve of the BA-ELM method are close to the actual data in all testing days, which indicates its higher fitting accuracy.We commonly consider the RE in the range of [−3%, 3%] and [−1%, 1%] as a standard to testify the performance of the proposed model.Based on these tables and figures, we can determine that: (1) on 28 June, the relative errors of the proposed model and others were all in the range of [−3%, 3%]; only one point (3.52%) of BPNN on 29 June and one point (−3.50%) of LSSVM on 30 June are beyond the range of [−3%, 3%], which indicates that the accuracy is increased after the process of reducing dimensions and clustering.(2) Most relative error points of the BA-ELM locate in the range of [−1%, 1%] on all three days.By contrast, most points of the ELM are beyond the range of [−1%, 1%], which can demonstrate that the BA applied in ELM increases the accuracy and stability of ELM.(3) On 28 June, called Day 1 in this paper, the ELM has 14 predicted points exceed the range of [−1%, 1%], and there is only one point (2.12%) beyond the range of [−2%, 2%] at 21:00; the BP has a dozen predicted points outside the range of [−1%, 1%], and there is one predicted point (−2.05%)beyond the range of [−2%, 2%] at 11:00; the LSSVM has 14 predicted points beyond the range of [−1%, 1%], and there are six predicted points beyond the range of [−2%, 2%], which are −2.38% at 11:00, −2.76% at 12:00, −2.07%at 16:00, −2.85% at 17:00, −2.17% at 18:00 and −2.7% at 19:00.(4) On 29 June, called Day 2 in this paper, the ELM has 10 predicted points exceed the range of [−1%, 1%], and there is only one points beyond the range of [−2%, 2%], which is 2.52% at 21:00; the BP has 16 predicted points exceeding the range of [−1%, 1%], and there are three predicted points beyond the range of [−2%, 2%], which are 3.52% at 7:00, −2.03% at 12:00 and −2.03% at 14:00; the LSSVM has 13 predicted points beyond the range of [−1%, 1%], and there are four predicted points outside the range of [−2%, 2%], which are −2.25% at 12:00, −2.27% at 16:00, −2.77% at 15:00 and −2.17% at 19:00.(5) On 30 June, called Day 3 in this paper, the ELM has 15 predicted points exceed the range of [−1%, 1%], and there are three points beyond the range of [−2%, 2%], which are −2.48% at 8:00, −2.19% at 17:00 and −2.61% at 19:00; the BP has 19 predicted points exceed the range of [−1%, 1%], and there are six predicted points beyond the range of [−2%, 2%], which are 2.91% at 7:00, −2.43% at 10:00, −2.85% at 12:00, −2.73% at 14:00, −2.3% at 15:00 and −2.05% at 22:00; the LSSVM has 18 predicted points beyond the range of [−1%, 1%], and there are nine predicted points outside the range of [−2%, 2%], which are −2.17% at 12:00, −2.03% at 13:00, −2.59% at 14:00, −2.41% at 15:00, −3.5% at 16:00, −2.19% at 17:00 and −2.78% at 18:00.From the global view of relative errors, the forecasting accuracy of BA-ELM is better than the other models, since it has the most predicted points in the ranges [−1%, 1%], [−2%, 2%] and [−3%, 3%].Compared with BPNN and LSSVM, the relative errors of ELM are low.The reason is that the BPNN can have advantages when dealing with the big sample, but its forecasting results are not very good when dealing with a small sample problem like short-term load forecasting.The kernel parameter and penalty factor setting manually of LSSVM are difficult to confirm, which has a significant influence on the forecasting accuracy.
The number of points that are less than 1%, 2%, 3% and more than 3% and the corresponding percentage of them in the predicted points are accounted for, respectively.The statistical results are shown in Table 10.It can be seen that there are 61 predicted points whose the AE of the BA-ELM model is less than 1%, which accounts for 84.72% of the total amount; and 10 predicted points in the range of [1%, 2%], accounting for 13.89% of the total amount; and only 1 predicted point in the range of [2%, 3%], accounting for 1.39% of the total amount.Moreover, there are no predicted points whose AE is more than 3%, accounting for 0% of the total amount.It can be concluded that the forecasting performance of the proposed model is superior, and its accuracy is higher, which means the BA-ELM model is suitable for short-term load forecasting.
The average RMSE and MAPE of the BA-ELM, ELM, BPNN and LSSVM models are listed in Table 11.In order to show the comparisons clearly, the RMSE, MAE and MAPE of four forecasting models in three testing days are show in Figures 14-16.It can be concluded that both of the RMSE, MAE and MAPE of BA-ELM are lower on three testing days.On 28 June, the RMSE, MAE and MAPE of ELM are slightly bigger than BP, but smaller than that of LSSVM.On 29 June, the RMSE, MAE and MAPE of ELM are smaller than that of BP and LSSVM.The RMSE, MAE and MAPE of BP are close to that of LSSVM.On 30 June, the RMSE, MAE and MAPE of ELM are smaller than BP and LSSVM's, and that of BP are smaller than LSSVM's.To sum up, combining this with the Table 11, the average behavior of four models are BA-ELM, ELM, BPNN and LSSVM from low to high successively.

Conclusions
With the development of society and technology, research to improve the precision of load forecasting has become necessary because short-term power load forecasting can be regarded as a vital component of smart grids that can not only reduce electric power costs but also ensure the continuous flow of electricity supply.This paper selected 22 original indexes as the influential factors of power load and factor analysis was employed to discuss their correlation and economic connotations, from which it can be seen that the historical data occupied the largest contribution rate and the meteorological factor followed thereafter.Consequently, the paper introduced the auto correlation and partial auto correlation function to further explore the relationship between historical load and current load.Considering the influence of similar day, ant colony clustering was adopted to cluster the sample for the sake of searching the days with analogous features.Finally, the extreme learning machine optimized by a bat algorithm was conducted to predict the days that are chosen to test.The simulation experiment carried out in Yangquan City in China verified the effectiveness and applicability of the proposed model, and a comparison with benchmark models illustrated the superiority of the novel hybrid model successfully.

Conclusions
With the development of society and technology, research to improve the precision of load forecasting has become necessary because short-term power load forecasting can be regarded as a vital component of smart grids that can not only reduce electric power costs but also ensure the continuous flow of electricity supply.This paper selected 22 original indexes as the influential factors of power load and factor analysis was employed to discuss their correlation and economic connotations, from which it can be seen that the historical data occupied the largest contribution rate and the meteorological factor followed thereafter.Consequently, the paper introduced the auto correlation and partial auto correlation function to further explore the relationship between historical load and current load.Considering the influence of similar day, ant colony clustering was adopted to cluster the sample for the sake of searching the days with analogous features.Finally, the extreme learning machine optimized by a bat algorithm was conducted to predict the days that are chosen to test.The simulation experiment carried out in Yangquan City in China verified the effectiveness and applicability of the proposed model, and a comparison with benchmark models illustrated the superiority of the novel hybrid model successfully.

Figure 1 .
Figure 1.The framework of the extreme learning machine.

Figure 1 .
Figure 1.The framework of the extreme learning machine.

Figure 2 .
Figure 2. The flowchart of the ant colony clustering algorithm.

Figure 2 .
Figure 2. The flowchart of the ant colony clustering algorithm.

Energies 2018 ,
11,  x FOR PEER REVIEW 6 of 18 day-bat algorithm-extreme learning machine (FA-SD-BA-ELM) model is shown in Figure3.As discussed in part 1, auto correlation and the partial correlation function (PACF) are executed to analyze the inner relationships between the history loads.Based on the influencing factors of load, factor analysis (FA) is used for extracting input variables.According to the result of factors analysis and previous load, the ant colony clustering algorithm (ACC) is used to find historical days that have common factors similar to the forecast day.Part 2 is the bat optimization algorithm (BA) and part 3 is the forecasting of the extreme learning machine (ELM).

Figure 3 .
Figure 3.The flowchart of the factor analysis-ant colony clustering-bat algorithm-extreme learning machine (FA-ACC-BA-ELM) model.

Figure 3 .
Figure 3.The flowchart of the factor analysis-ant colony clustering-bat algorithm-extreme learning machine (FA-ACC-BA-ELM) model.

Figure 4 .
Figure 4.The partial auto correlation result of the overall power load.

Figure 5 .
Figure 5.The partial auto correlation result of the load with the same interval.

Figure 4 .
Figure 4.The partial auto correlation result of the overall power load.

Figure 5 .
Figure 5.The partial auto correlation result of the load with the same interval.

Figure 6 .
Figure 6.The iterations process of the bat algorithm (BA).

Figure 7 .
Figure 7.The four types of power load curve.

Figure 6 .
Figure 6.The iterations process of the bat algorithm (BA).

Figure 6 .
Figure 6.The iterations process of the bat algorithm (BA).

Figure 7 .
Figure 7.The four types of power load curve.

Figure 7 .
Figure 7.The four types of power load curve.

Figure 11 .
Figure 11.Compared relative errors of four models on 28 June.

Figure 11 .
Figure 11.Compared relative errors of four models on 28 June.

Figure 11 .
Figure 11.Compared relative errors of four models on 28 June.

Figure 11 .
Figure 11.Compared relative errors of four models on 28 June.Figure 11.Compared relative errors of four models on 28 June.

Figure 11 .
Figure 11.Compared relative errors of four models on 28 June.Figure 11.Compared relative errors of four models on 28 June.

Figure 12 .
Figure 12.Compared relative errors of four models on 29 June.

Figure 13 .
Figure 13.Compared relative errors of four models on 30 June.

Figure 12 . 18 Figure 12 .
Figure 12.Compared relative errors of four models on 29 June.

Figure 13 .
Figure 13.Compared relative errors of four models on 30 June.

Figure 13 .
Figure 13.Compared relative errors of four models on 30 June.

Figure 14 .
Figure 14.Root-mean-square error (RMSE) of different models in testing period.

Figure 15 .
Figure 15.Mean absolute percentage error (MAPE) of different models in testing period.

Figure 14 .
Figure 14.Root-mean-square error (RMSE) of different models in testing period.

Figure 14 .
Figure 14.Root-mean-square error (RMSE) of different models in testing period.

Figure 15 .
Figure 15.Mean absolute percentage error (MAPE) of different models in testing period.Figure 15.Mean absolute percentage error (MAPE) of different models in testing period.

Figure 15 . 18 Figure 16 .
Figure 15.Mean absolute percentage error (MAPE) of different models in testing period.Figure 15.Mean absolute percentage error (MAPE) of different models in testing period.

Figure 16 .
Figure 16.Mean absolute error (MAE) of different models in testing period.

Table 1 .
KMO and Barlett test of sphericity.

Table 2 .
Results of factor analysis.

Table 3 .
Parameters of the ant colony clustering algorithm.

Table 3 .
Parameters of the ant colony clustering algorithm.

Table 4 .
Results of ant colony clustering algorithm.

Table 7 .
Actual load and forecasting results on Day 1 (Unit: MV).

Table 7 .
Actual load and forecasting results on Day 1 (Unit: MV).

Table 7 .
Actual load and forecasting results on Day 1 (Unit: MV).

Table 8 .
Actual load and forecasting results on Day 2 (Unit: MV).

Table 9 .
Actual load and forecasting results on Day 3 (Unit: MV).

Table 10 .
Accuracy estimation of the prediction point for the test set.

Table 11 .
Average forecasting results of four models.

Table 10 .
Accuracy estimation of the prediction point for the test set.

Table 11 .
Average forecasting results of four models.

Table 10 .
Accuracy estimation of the prediction point for the test set.

Table 11 .
Average forecasting results of four models.