A Short-Term Load Forecasting Model with a Modiﬁed Particle Swarm Optimization Algorithm and Least Squares Support Vector Machine Based on the Denoising Method of Empirical Mode Decomposition and Grey Relational Analysis

: As an important part of power system planning and the basis of economic operation of power systems, the main work of power load forecasting is to predict the time distribution and spatial distribution of future power loads. The accuracy of load forecasting will directly inﬂuence the reliability of the power system. In this paper, a novel short-term Empirical Mode Decomposition-Grey Relational Analysis-Modiﬁed Particle Swarm Optimization-Least Squares Support Vector Machine (EMD-GRA-MPSO-LSSVM) load forecasting model is proposed. The model uses the de-noising method combining empirical mode decomposition and grey relational analysis to process the original load series and forecasts the processed subsequences by the algorithm of modiﬁed particle swarm optimization and least square support vector machine. Then, the ﬁnal forecasting results can be obtained after reconstructing the forecasting series. This paper takes the Jibei area as an example to produce an empirical analysis for load forecasting. The model input includes the hourly load one week before the forecasting day and the daily maximum temperature, daily minimum temperature, daily average temperature, relative humidity, wind force, date type of the forecasting day. The model output is the hourly load of the forecasting day. The models of BP neural network, SVM (Support vector machine), LSSVM (Least squares support vector machine), PSO-LSSVM (Particle swarm optimization-Least squares support vector machine), MPSO-LSSVM (Modiﬁed particle swarm optimization-Least squares support vector machine), EMD-MPSO-LSSVM are selected to compare with the model of EMD-GRA-MPSO-LSSVM using the same sample. The comparison results verify that the short-term load forecasting model of EMD-GRA-MPSO-LSSVM proposed in this paper is superior to other models and has strong generalization ability and robustness. It can achieve good forecasting effect with high forecasting accuracy, providing a new idea and reference for accurate short-term load forecasting.


Introduction
With the continuous development of the national economy, the demand for electricity is increasing day by day.The power industry plays an increasingly important role in social development.As an important part of power system planning and the basis of economic operation of power systems, the main task of power load forecasting is to predict the time distribution and spatial distribution of Energies 2017, 10, 408 2 of 20 future power loads.The accuracy of load forecasting will directly influence the reliability of the power system, and then influence the national economy.Short-term load forecasting is to predict the load of the next 24 h to a few days according to historical load, weather, holidays and other factors.Short-term loads present randomness, whose forecasting accuracy is sensitive to noise.Therefore, it is significant to select appropriate forecasting algorithms and denoising methods and construct a reasonable and effective forecasting model for accurate short-term load forecasting.
In recent years, with the development of artificial intelligence technology, artificial intelligence forecasting technology has been gradually replacing the classical forecasting technology in the field of load forecasting.Compared with the classical forecasting technology, artificial intelligence forecasting technology has obvious forecasting accuracy advantages.The neural network algorithm is a typical and widely used artificial intelligence forecasting technology.Artificial neural network (ANN) is a theoretical human brain neural network mathematical model which originated in the 1940s.It is connected by a large number of neurons with adjustable connection weights, and has the characteristics of large-scale parallel processing, distributed information storage, good self-learning ability and so on [1][2][3].BP (Backpropagation) neural network [4,5] and RBF neural network [6][7][8] are common neural networks.BP neural network and RBF neural network are both nonlinear multilayer feedforward networks, but they are different in network structure, training algorithm, network resource utilization and approximation performance.Li and Huang [4] proposed a short-term power load forecasting method based on a BP neural network.The method firstly reduces the factors which affect load forecasting by using rough set theory, then takes the reduced data as input variables of the BP neural network model, and gets the forecast value by using back-propagation algorithm.Li et al. [6] proposed a dynamic RBF neural network model for load forecasting.By introducing the error discriminant function, the load forecasting error is controlled, and the prediction model is modified dynamically.In general, neural network algorithm has strong nonlinear fitting ability, which does well in nonlinear function approximation.It has simple learning rules and is easy to implement by computer.However the neural network algorithm also has some shortcomings, such as slow convergence speed, long training time, easily getting trapping into local optima, overfitting and so on [9].Support vector machine (SVM) is another artificial intelligence forecasting technology, which is a machine learning method for small samples based on the VC dimension theory of statistical learning theory and structural risk minimization.Compared with the neural network algorithm the SVM algorithm has advantages of good robustness, strong generalization ability, fast computation speed and avoiding falling into local optima [10,11].Least squares support vector machine (LSSVM) is an improved algorithm for the standard support vector machine using least squares value function and equality constraint [12,13].It has faster training speed and better convergence precision [14][15][16].When we use the least squares support vector machine to forecast the load, the regularization parameter and the radial basis kernel function parameter need to be set, which directly influence the forecasting accuracy.Therefore, the researchers use a variety of methods to optimize the parameters of LSSVM, such as genetic algorithm (GA) [17], ant colony optimization algorithm (ACO) [18], artificial bee colony algorithm (ABC) [19], differential evolution algorithm (DE) [20], bat algorithm(BA) [21], principal component analysis [22].Mahjoob et al. [17] used a least-squares support vector machines approach, in combination with a hybrid genetic algorithm based optimization to forecast market clearing prices in two different power markets.The forecasting results are compared to a select variety of previously proposed methods such as MLP, ARIMA, wavelet-ARIMA, neuro-fuzzy and time series based models.Liu and Jiang [18] proposed a novel PHM algorithm based on the ACO and LSSVM algorithms.They used a time series analysis prediction method to calculate seasonal factors combined with a ACO-LSSVM algorithm to model the failure rate of an aviation device and obtained good experimental results.Yusof et al. [19] proposed a forecasting model based on an optimized Least Squares Support Vector Machine.The determination of hyper-parameters is performed using a nature-inspired algorithm, the Artificial Bee Colony.The proposed forecasting model is realized in gold price forecasting.Qiao et al. [20] built a forecasting model of natural gas daily load based on wavelet transform and Least Squares Support Vector Machine-Differential Evolution (LSSVM-DE), which provides a practical and feasible method for predicting the time series of natural gas daily loads.The algorithms above provide some help for the optimization of LSSVM parameters, but their obvious shortcomings of low search efficiency, slow convergence speed and easily getting trapping into local optima can't be ignored.In this paper, a modified particle swarm optimization algorithm with averaging particle distance and mutation operator is proposed to optimize the parameters of least squares support vector machine.Particle swarm optimization (PSO) is a computation technique originated from the study of bird predatory behavior, proposed by Eberhart and Kennedy in 1995 [23][24][25][26][27]. Zhu and Wang [26] put forward the power load forecasting method between LSSVM and PSO optimization algorithm.By taking historical load data, and meteorological factors, etc. as input, they built predictive models to forecast the future power load.Sun and Yao [27] proposed a least squares support vector machine (LSSVM) model optimized by the particle swarm optimization (PSO).The phase space of the chaotic wind speed time series is reconstructed by calculating the embedding dimension and the delay time of the wind speed time series.Particle swarm optimization algorithm has the advantages of less adjustment parameters, high search efficiency and fast convergence speed, and its global search ability will be greatly enhanced after modified by averaging particle distance and mutation operator, which can effectively avoid falling into local optimum.
In order to forecast the short-term load which is sensitive to external factors more accurately, some researchers use time series decomposition techniques to decompose the original load sequence into several subsequences.The subsequences are predicted separately and then restructured to be the final forecasting series.The forecasting accuracy is greatly improved in this way.Wavelet decomposition [28][29][30][31][32][33] and classical model decomposition [34][35][36][37][38][39] are two commonly used time series decomposition methods.Xia et al. [32] gave a method which is based on the wavelet decomposition and the neural network to predict the short-time load.After decomposing the load sequence into subsequences on different scales by using wavelet transform, forecasting and restructuring the subsequences, then the final forecasting results can be obtained.Wang and Li [39] proposed a forecasting method based on empirical mode decomposition (EMD) and least squares support vector machine (LSSVM).Considering the power characteristics, unit efficiency and the operation condition of the generators, the one-month ahead forecasted output power of the wind power plant can be obtained.The wavelet decomposition has low resolution and is sensitive to the selection of wavelet bases, but the empirical mode decomposition has high resolution and the characteristic of self-adaptation with no need to select complex wavelet bases.Therefore, EMD is selected to decompose the load series in this paper.Considering that the short-term load has some noise, we combine Empirical Mode Decomposition with Grey Relational Analysis [40][41][42] to process the original load series for de-noising, which will further improve the accuracy of load forecasting.
In this paper, a hybrid EMD-GRA-MPSO-LSSVM model with high accuracy for short-term load forecasting is proposed.The main content and structure of this paper are as follows: the second section introduces the forecasting algorithm-MPSO-LSSVM-using a particle swarm optimization algorithm modified by averaging particle distance and mutation operator to optimize the parameters of the least squares support vector machine.The third section describes the de-noising method of EMD-GRA, using EMD to decompose the original load series, then removing the noise subsequently by GRA.The fourth section introduces the forecasting process based on the hybrid EMD-GRA-MPSO-LSSVM model.The fifth section gives an empirical analysis to verify the practicability, reliability and validity of the EMD-GRA-MPSO-LSSVM model.The sixth section summarizes the work.problems into linear problems through least squares value functions and equality constraints.It has faster training speed and better convergence precision [14,16].

Forecasting Algorithm
When we use LSSVM to solve the regression optimization problem, the loss function in the optimization objective function is the quadratic term of the error.The constraint condition is the equality constraint, as shown in Equation (1): where ω is the weight vector of high dimensional feature space.ω 2 controls the complexity of the model.c is the regularization parameter.ξ i is the error.b is the bias constant.Introducing the Lagrange multiplier, Equation ( 1) can be converted into: According to the condition of Karush-Kuhn-Tucker, what we can obtain is as follows: After eliminating ω and ξ, the linear system of equations are described as follows: where K(x i , x j ) = ϕ(x i ) T ϕ(x j ) is the kernel function that meets the condition of Mercer.α i and b are obtained by least square method, and the nonlinear forecasting model is shown in Equation ( 5): Particle swarm optimization (PSO) is a swarm intelligence optimization algorithm based on iterative computation, originated from the study of bird predatory behavior, first proposed by Kennedy and Eberhart [23].The PSO algorithm initializes a group of random particles in a solvable space and each particle represents a potential optimal solution to the optimization problem.The characteristics of particles contain location, velocity and fitness value.Fitness value calculated by fitness function is the index for evaluating the advantages and disadvantages of particles.In each iteration, the particles update their velocity and next position by tracking the individual extremum position and the global extremum position.
Energies 2017, 10, 408 5 of 20 Suppose that n particles form a population in a d dimensional search space.The i particle is represented as a d dimensional vector that is the position of the i particle in the d dimension search space.
The velocity of the i particle is The current best location of the i particle is The current best location of the whole population is In each iteration, the particles update their velocity and position by Equations ( 6) and ( 7): where k is the current iteration number.ω is the inertia weight.c 1 and c 2 are acceleration factors.r 1 and r 2 are random numbers within [0, 1] and the same random numbers are used in every dimension.

Modified PSO
For complex optimization problems, the standard particle swarm optimization algorithm is prone to premature convergence.In order to solve this problem, averaging particle distance and mutation operator are introduced to guide the selection of initial population and judging particle premature convergence for avoiding falling into local optimum [24,43].
The initial particle swarm selection is random.Ideally, the finite particles should be uniformly distributed over the entire solution space.However, the ideal state is difficult to achieve.So averaging particle distance is applied to increase the probability of searching the global optimal solution and avoid trapping into a local optimum.Averaging particle distance is defined as follows: where D(t) is the discrete degree of particle distribution in the population.L is the maximum diagonal length of the search space.m is the number of particles.n is the dimension of the solution space.x id represents the d dimensional coordinate value of the i particle.x d represents the mean value of the d dimensional coordinate values of all particles.
The standard particle swarm optimization algorithm converges fast in the early stage of operation and slows down later.It's easily trapped in a local optimum and loses the ability of population evolution.The particle location determines its fitness, so the current state of the population and the premature convergence of the particles can be judged according to the overall variation of the fitness of all particles in the population.Suppose the current fitness of the i particle is f i .The current average fitness of the population is f .The fitness variance of the population is defined as follows: where m is the number of the particles in the population.f is the normalized scaling factor for limiting σ 2 , which can be calculated by Equation (10): The fitness variance reflects the discrete degree of particles in the population.The smaller the σ 2 is, the smaller the discrete degree of particles is.Otherwise, the larger the discrete degree is.
With the increasing numbers of iterations, D(t) and σ 2 are getting smaller.When D(t) < α and σ 2 < β (α, β belong to a certain threshold), the algorithm is considered to get into the late search stage and the population appears to prematurely converge.We retain the current optimal location of the particle swarm and redistribute the particle solution space to guide the particles to jump out of the local optimum.

MPSO-LSSVM
When we use the least squares support vector machine with RBF kernel function to forecast the load, the regularization parameter c and the radial basis kernel function parameter g need to be set, which directly influences the forecasting accuracy [26,27].In this paper, the modified particle swarm optimization algorithm is proposed to optimize the parameters of least squares support vector machine.The optimization steps are as follows: (1) Set the parameters of the PSO and initialize the particle swarm.The process of LSSVM parameters optimization based on MPSO is described in Figure 1.

i 
The fitness variance reflects the discrete degree of particles in the population.The smaller the 2 σ is, the smaller the discrete degree of particles is.Otherwise, the larger the discrete degree is.
With the increasing numbers of iterations, ( ) D t and 2 σ are getting smaller.When ( ) D t α < and 2 σ β < ( , α β belong to a certain threshold), the algorithm is considered to get into the late search stage and the population appears to prematurely converge.We retain the current optimal location of the particle swarm and redistribute the particle solution space to guide the particles to jump out of the local optimum.

MPSO-LSSVM
When we use the least squares support vector machine with RBF kernel function to forecast the load, the regularization parameter c and the radial basis kernel function parameter g need to be set, which directly influences the forecasting accuracy [26,27].In this paper, the modified particle swarm optimization algorithm is proposed to optimize the parameters of least squares support vector machine.The optimization steps are as follows: (1) Set the parameters of the PSO and initialize the particle swarm.The process of LSSVM parameters optimization based on MPSO is described in Figure 1.

EMD
Empirical Mode Decomposition is the core algorithm of Hilbert-Huang transform (HHT), proposed by Huang et al. in 1998 [34].EMD is a new adaptive signal time-frequency processing method.It can decompose complex signals into finite intrinsic mode functions (IMF) according to the time scale characteristics of data.The decomposed IMF components contain local characteristic signals with different time scales of the original signal.The IMF must meet the conditions that the difference between the extremum number and the zero number of the signal is 0 or 1 and the average value of the upper envelope and the lower envelope of the signal is zero [37][38][39].For a given signal, the EMD steps are as follows: (1) Find all the local extreme of X(t) and fit the upper envelope f a (t) and the lower envelope f b (t) of X(t) by cubic spline function.(2) Calculate the average value f m (t) of the upper envelope and lower envelope.
(3) Calculate the difference between X(t) and f m (t), E(t) = X(t) − f m (t).(4) Repeat the steps (1) ~( 3) regarding E(t) as the original series.When the mean envelope f m (t) tends to zero, the first Repeat the steps (1)~(4) regarding X 1 (t) as the original series, then the second IMF component im f 2 (t) is obtained.Repeat this process until the difference function is a constant function or a monotone function.Then, the original series can be described as Equation (11):

GRA
Grey relational analysis (GRA) is a multivariate statistical analysis method, which completes the comparison of the geometric relation of the statistical data in the time series of the system through the quantitative analysis of the dynamic development trend to obtain the grey relational degrees between the reference series and the comparison series [40,41].The larger the grey relational degree is, the closer the development direction and speed of the series are, so does the relationship between the series.The core of GRA is to calculate the grey relational degree, and the specific calculation steps are as follows: (1) Determine the analysis series.
Set the reference series: Set the comparison series: (2) Standardize the data.
The correlation coefficient between y j (k) and y 0 (k) is shown in Equation ( 15): (4) Calculate the relational degree.
The grey relational degree between X j and X 0 is shown in Equation ( 16): (5) Rank the relational degree.
Rank the comparison series according to the grey relational degree.The larger the grey relational degree is, the more consistent the variation trends between the comparison series and the reference series are.

EMD-GRA
In the load forecasting, the original historical load data often contain noise pollution, which makes it difficult to accurately analyze the actual load change rules.So, the original load data need to be processed for de-noising.EMD-GRA is a data processing model combining empirical mode decomposition and grey relational analysis for de-noising.Its core idea is to decompose the non-stationary signal to obtain a set of stationary intrinsic mode functions and eliminate useless components in the signal [37].The basic steps of EMD-GRA de-noising method are as follows: (1) Decompose the original time series by EMD and obtain finite intrinsic mode functions.
(2) Calculate the grey relational degree between each IMF and the original series and rank the relational degree.(3) Eliminate the IMF with the lowest relational degree and reconstruct the rest IMFs.Then, get the de-noising time series.
The process of EMD-GRA de-noising is described in Figure 2.

The Forecasting Model of EMD-GRA-MPSO-LSSVM
The accuracy of short term load forecasting is influenced by many factors, such as: temperature, humidity, wind force and holidays.In order to forecast the next 24 h load more accurately, the load forecasting model of Emd-GRA-MPSO-LSSVM considering weather and date type is proposed in this paper.The forecasting steps are as follows: (1) Data collecting and processing.
Collecting sample data, including historical load, daily maximum temperature, daily minimum temperature, daily average temperature, relative humidity, wind force, date type, etc.Then, process the data.Normalize the weather data and use 1 and 0 to represent holidays and work days respectively for day type.
(2) De-noising processing Decompose the original time series by EMD.Calculate the grey relational degree between each IMF and the original series and rank the relational degree.Then, eliminate the IMF with the lowest relational degree.
(3) Load forecasting Use the MPSO-LSSVM model to forecast the IMFs which are processed by the EMD-GRA de-noising method and reconstruct the forecasting series.Then, get the final forecasting results.
The model of EMD-GRA-MPSO-LSSVM is shown in Figure 3.

Sample Selection
The Jibei area is located in the north of Hebei Province, China, and contains Tangshan City, Langfang City, Chengde City, Zhangjiakou City and Qinhuangdao City.In this paper, we select the hourly load, daily maximum temperature, daily minimum temperature, daily average temperature, relative humidity, wind force, date type of the Jibei area from 1 January to 14 March 2016 as the training sample to forecast the hourly load in 15 March.The model input includes the hourly load one week before the forecasting day and the daily maximum temperature, daily minimum temperature, daily average temperature, relative humidity, wind force, date type of the forecasting day where the weather data take the average value of the five cities in Jibei area.The model output is the hourly load of the forecasting day.

The Forecasting Model of EMD-GRA-MPSO-LSSVM
The accuracy of short term load forecasting is influenced by many factors, such as: temperature, humidity, wind force and holidays.In order to forecast the next 24 h load more accurately, the load forecasting model of Emd-GRA-MPSO-LSSVM considering weather and date type is proposed in this paper.The forecasting steps are as follows: (1) Data collecting and processing Collecting sample data, including historical load, daily maximum temperature, daily minimum temperature, daily average temperature, relative humidity, wind force, date type, etc.Then, process the data.Normalize the weather data and use 1 and 0 to represent holidays and work days respectively for day type.
(2) De-noising processing Decompose the original time series by EMD.Calculate the grey relational degree between each IMF and the original series and rank the relational degree.Then, eliminate the IMF with the lowest relational degree.
(3) Load forecasting Use the MPSO-LSSVM model to forecast the IMFs which are processed by the EMD-GRA de-noising method and reconstruct the forecasting series.Then, get the final forecasting results.
The model of EMD-GRA-MPSO-LSSVM is shown in Figure 3.

Sample Selection
The Jibei area is located in the north of Hebei Province, China, and contains Tangshan City, Langfang City, Chengde City, Zhangjiakou City and Qinhuangdao City.In this paper, we select the hourly load, daily maximum temperature, daily minimum temperature, daily average temperature, relative humidity, wind force, date type of the Jibei area from 1 January to 14 March 2016 as the training sample to forecast the hourly load in 15 March.The model input includes the hourly load one week before the forecasting day and the daily maximum temperature, daily minimum temperature, daily average temperature, relative humidity, wind force, date type of the forecasting day where the weather data take the average value of the five cities in Jibei area.The model output is the hourly load of the forecasting day.

Load Forecasting Based on EMD-GRA-MPSO-LSSVM
Decompose the original historical load series by EMD.Input the signal series containing 1800 h load from 1 January to 15 March 2016 in Jibei area into the EMD model and obtain 9 IMFs and 1 residual.The decomposition results are shown in Figure 4.

Load Forecasting Based on EMD-GRA-MPSO-LSSVM
Decompose the original historical load series by EMD.Input the signal series containing 1800 h load from 1 January to 15 March 2016 in Jibei area into the EMD model and obtain 9 IMFs and 1 residual.The decomposition results are shown in Figure 4.
Calculate the grey relational degree between each IMF and the original series and rank the relational degree.The rank results are shown in Figure 5.
As shown in Figure 5, the grey relational degree between the fourth IMF and the original series is minimum.Eliminate the fourth IMF and use the MPSO-LSSVM model to forecast the rest IMFs and the residual.The main parameters of MPSO-LSSVM are shown in Table 1.
The forecasting results of each series are shown in Figure 6.

Load Forecasting Based on EMD-GRA-MPSO-LSSVM
Decompose the original historical load series by EMD.Input the signal series containing 1800 h load from 1 January to 15 March 2016 in Jibei area into the EMD model and obtain 9 IMFs and 1 residual.The decomposition results are shown in Figure 4.  Calculate the grey relational degree between each IMF and the original series and rank the relational degree.The rank results are shown in Figure 5.As shown in Figure 5, the grey relational degree between the fourth IMF and the original series is minimum.Eliminate the fourth IMF and use the MPSO-LSSVM model to forecast the rest IMFs and the residual.The main parameters of MPSO-LSSVM are shown in Table 1.The forecasting results of each series are shown in Figure 6.Calculate the grey relational degree between each IMF and the original series and rank the relational degree.The rank results are shown in Figure 5.As shown in Figure 5, the grey relational degree between the fourth IMF and the original series is minimum.Eliminate the fourth IMF and use the MPSO-LSSVM model to forecast the rest IMFs and the residual.The main parameters of MPSO-LSSVM are shown in Table 1.Reconstruct the forecasting series to get the final forecasting results of the hourly load in 15 March and the relative error shown in Figure 7.
As shown in Figure 7, when we use the EMD-GRA-MPSO-LSSVM model to forecast the hourly load in 15 March in the Jibei area, the forecasting curve fits very well and the forecasting precision is very high.The relative error of each forecasting point is not more than 3%.As shown in Figure 7, when we use the EMD-GRA-MPSO-LSSVM model to forecast the hourly load in 15 March in the Jibei area, the forecasting curve fits very well and the forecasting precision is very high.The relative error of each forecasting point is not more than 3%.

Model Comparison
In order to verify the superiority and effectiveness of the EMD-GRA-MPSO-LSSVM model, the BP neural network, SVM, LSSVM, PSO-LSSVM, MPSO-LSSVM, EMD-MPSO-LSSVM models are selected to compare with the EMD-GRA-MPSO-LSSVM model using the same sample.The comparison of forecasting results, the relative error and the boxplot of relative error for different models are shown in Figures 8-10, respectively.

Model Comparison
In order to verify the superiority and effectiveness of the EMD-GRA-MPSO-LSSVM model, the BP neural network, SVM, LSSVM, PSO-LSSVM, MPSO-LSSVM, EMD-MPSO-LSSVM models are selected to compare with the EMD-GRA-MPSO-LSSVM model using the same sample.The comparison of forecasting results, the relative error and the boxplot of relative error for different models are shown in Figures 8-10, respectively.As shown in Figure 7, when we use the EMD-GRA-MPSO-LSSVM model to forecast the hourly load in 15 March in the Jibei area, the forecasting curve fits very well and the forecasting precision is very high.The relative error of each forecasting point is not more than 3%.

Model Comparison
In order to verify the superiority and effectiveness of the EMD-GRA-MPSO-LSSVM model, the BP neural network, SVM, LSSVM, PSO-LSSVM, MPSO-LSSVM, EMD-MPSO-LSSVM models are selected to compare with the EMD-GRA-MPSO-LSSVM model using the same sample.The comparison of forecasting results, the relative error and the boxplot of relative error for different models are shown in Figures 8-10, respectively.The fitting results of the forecasting curves for each model are presented in Figure 8.The forecasting curve of the EMD-GRA-MPSO-LSSVM model fits best.The relative errors of the seven models for the hourly load forecasting in 15 March 2016 in Jibei area are shown in Figure 9.The boxplot in Figure 10 shows the five statistics of the relative error of each forecasting model that is the minimum, first quartile, the median, third quartile and the maximum.As we can see from Figures 9 to 10, the relative error of the EMD-GRA-MPSO-LSSVM model for load forecasting is minimum, followed by the EMD-MPSO-LSSVM model and the BP neural network model has the maximum relative error.The relative errors of all models are not more than 8%.
In order to evaluate the forecasting performance of each model more accurately, the mean absolute percentage error MAPE , root mean square error RMSE , mean absolute error MAE and nonlinear function goodness of fit 2  R are used to compare the forecasting accuracy of each model in this paper.The calculating equations of these indexes are shown as follows: ) The calculated results of the indexes for each model are shown in Table 2.The fitting results of the forecasting curves for each model are presented in Figure 8.The forecasting curve of the EMD-GRA-MPSO-LSSVM model fits best.The relative errors of the seven models for the hourly load forecasting in 15 March 2016 in Jibei area are shown in Figure 9.The boxplot in Figure 10 shows the five statistics of the relative error of each forecasting model that is the minimum, first quartile, the median, third quartile and the maximum.As we can see from Figure 9 to Figure 10, the relative error of the EMD-GRA-MPSO-LSSVM model for load forecasting is minimum, followed by the EMD-MPSO-LSSVM model and the BP neural network model has the maximum relative error.The relative errors of all models are not more than 8%.
In order to evaluate the forecasting performance of each model more accurately, the mean absolute percentage error MAPE, root mean square error RMSE, mean absolute error MAE and nonlinear function goodness of fit R 2 are used to compare the forecasting accuracy of each model in this paper.The calculating equations of these indexes are shown as follows: The calculated results of the indexes for each model are shown in Table 2.As shown in Table 2, the MAPE, RMSE and MAE of EMD-GRA-MPSO-LSSVM model are the smallest and the goodness of fit is the best of all models, reaching 98.62%.The MAPE, RMSE and MAE of BP model are the largest and the goodness of fit is the worst of all models, reaching 96.02%.
The LSSVM model optimized by the modified particle swarm optimization algorithm is better than the LSSVM model optimized by the standard particle swarm optimization algorithm, the LSSVM model and the SVM model for load forecasting.The MPSO-LSSVM model improved by empirical mode decomposition is significantly better than the MPSO-LSSVM model, but slightly inferior to the EMD-GRA-MPSO-LSSVM model.In general, the evaluation results of the four indexes for the seven models tend to be consistent.The forecasting accuracy of each model is ranked as follows: EMD-GRA-MPSO-LSSVM > EMD-MPSO-LSSVM > MPSO-LSSVM > PSO-LSSVM > LSSVM > SVM > BP.

Supplementary Experiment
In order to fully prove the reliability of the above experimental results, the BP neural network, SVM, LSSVM, PSO-LSSVM, MPSO-LSSVM, EMD-MPSO-LSSVM, EMD-GRA-MPSO-LSSVM models are used to forecast the hourly load for 16 consecutive days from 13 to 28 September 2016.The mean absolute percentage errors of the hourly load of each forecasting day for different models are calculated and the calculation results are shown in Table 3.The boxplot of MAPE for different models are shown in Figure 11.As shown in Table 3 and Figure 11, by means of the error analysis for the 384 prediction points of the newly added 16 sets of experiments, the mean absolute percentage errors for the hourly load of each forecasting day of the EMD-GRA-MPSO-LSSVM model are the smallest.The ranking of the prediction accuracy for the seven models in the 16 sets of experiments is consistent with the test result of the case in 15 March (EMD-GRA-MPSO-LSSVM > EMD-MPSO-LSSVM > MPSO-LSSVM > PSO-LSSVM > LSSVM > SVM > BP).It is proved that the EMD-GRA-MPSO-LSSVM model proposed in this paper is reliable and can achieve good prediction effect in short-term load forecasting.
Through the contrast and analysis, we can find that: (1) The fitting effect for short-term nonlinear time series of the combined model is obviously better than that of single model.(2) The modified optimization algorithm has the advantage over the original optimization algorithm in parameter optimization.(3) The method of time series decomposition can effectively improve the forecasting accuracy of the forecasting model.(4) The forecasting effect of the time series processed by de-noising method is superior to that of the original time series.
As a complex multiple combination forecasting model, EMD-GRA-MPSO-LSSVM can realize the complementary advantages of different algorithms.The EMD-GRA-MPSO-LSSVM model proposed in this paper eliminates the chaos of the original data by decomposing and denoising the nonstationary time series, which makes the time series more regular.The model realizes the reasonable choice of the parameters by modifying the optimization algorithm, which improves the forecasting accuracy.The model of EMD-GRA-MPSO-LSSVM combines a variety of forecasting methods to make up for the shortcomings of other models, and shows strong generalization ability and robustness.Through the analysis above, it is proved that the EMD-GRA-MPSO-LSSVM short-term load forecasting model proposed in this paper is practical and effective.

Conclusions
In this paper, a novel EMD-GRA-MPSO-LSSVM short-term load forecasting model is proposed.First of all, we process the original load data for denoising by EMD-GRA, using empirical mode decomposition to decompose the original load series and removing the noise subsequences by grey relational analysis.Then, we forecast the subsequences using the modified particle swarm optimization algorithm and least square support vector machine (MPSO-LSSVM) model.Finally, we As shown in Table 3 and Figure 11, by means of the error analysis for the 384 prediction points of the newly added 16 sets of experiments, the mean absolute percentage errors for the hourly load of each forecasting day of the EMD-GRA-MPSO-LSSVM model are the smallest.The ranking of the prediction accuracy for the seven models in the 16 sets of experiments is consistent with the test result of the case in 15 March (EMD-GRA-MPSO-LSSVM > EMD-MPSO-LSSVM > MPSO-LSSVM > PSO-LSSVM > LSSVM > SVM > BP).It is proved that the EMD-GRA-MPSO-LSSVM model proposed in this paper is reliable and can achieve good prediction effect in short-term load forecasting.
Through the contrast and analysis, we can find that: (1) The fitting effect for short-term nonlinear time series of the combined model is obviously better than that of single model.(2) The modified optimization algorithm has the advantage over the original optimization algorithm in parameter optimization.(3) The method of time series decomposition can effectively improve the forecasting accuracy of the forecasting model.(4) The forecasting effect of the time series processed by de-noising method is superior to that of the original time series.
As a complex multiple combination forecasting model, EMD-GRA-MPSO-LSSVM can realize the complementary advantages of different algorithms.The EMD-GRA-MPSO-LSSVM model proposed in this paper eliminates the chaos of the original data by decomposing and denoising the nonstationary time series, which makes the time series more regular.The model realizes the reasonable choice of the parameters by modifying the optimization algorithm, which improves the forecasting accuracy.The model of EMD-GRA-MPSO-LSSVM combines a variety of forecasting methods to make up for the shortcomings of other models, and shows strong generalization ability and robustness.Through the analysis above, it is proved that the EMD-GRA-MPSO-LSSVM short-term load forecasting model proposed in this paper is practical and effective.

Conclusions
In this paper, a novel EMD-GRA-MPSO-LSSVM short-term load forecasting model is proposed.First of all, we process the original load data for denoising by EMD-GRA, using empirical mode decomposition to decompose the original load series and removing the noise subsequences by grey relational analysis.Then, we forecast the subsequences using the modified particle swarm optimization algorithm and least square support vector machine (MPSO-LSSVM) model.Finally, we reconstruct the forecasting series and get the final forecasting results.This paper takes the Jibei area of China as an example to perform an empirical analysis for load forecasting considering historical load, weather conditions, holidays and other factors.And the models of BP neural network, SVM, LSSVM, PSO-LSSVM, MPSO-LSSVM, EMD-MPSO-LSSVM are selected to compare with the model of EMD-GRA-MPSO-LSSVM using the same sample.The comparison results verify that the EMD-GRA-MPSO-LSSVM short-term load forecasting model proposed in this paper has strong generalization ability and robustness and it can achieve good forecasting effect with high forecasting accuracy.

2. 1
. LSSVM The least squares support vector machine (LSSVM) is an improved version of the standard support vector machine algorithm first proposed by Suykens et al.It can transform quadratic program Energies 2017, 10, 408 4 of 20

( 2 )
Calculate the fitness value of each particle and find the current individual extreme position and global extreme position.(3) Calculate the averaging particle distance and mutation operator of the population and judge whether the particles fall into the premature convergence state.If the particles fall into local optimum, redistribute the particle solution space to guide the particles to jump out of local optimum.(4) Update the velocity and position of the particles to generate new species.Calculate the fitness values and compare them with historical optimal value.Update the individual extreme position and the global extreme position and cycle this process until the number of iterations is reached.Then, the optimization results can be obtained.(5) Forecast the load by LSSVM based on optimal parameters.

( 2 )
Calculate the fitness value of each particle and find the current individual extreme position and global extreme position.(3) Calculate the averaging particle distance and mutation operator of the population and judge whether the particles fall into the premature convergence state.If the particles fall into local optimum, redistribute the particle solution space to guide the particles to jump out of local optimum.(4) Update the velocity and position of the particles to generate new species.Calculate the fitness values and compare them with historical optimal value.Update the individual extreme position and the global extreme position and cycle this process until the number of iterations is reached.Then, the optimization results can be obtained.(5) Forecast the load by LSSVM based on optimal parameters.

Figure 1 .
Figure 1.The flow chart of LSSVM parameters optimization based on MPSO.

Table 1 .
The main parameters of MPSO-LSSVM.20The threshold of averaging particle distance 0.001The range of the inertia weight [0.4,0.9] The threshold of the fitness variance 0.01

Figure 8 .
Figure 8.Comparison of forecasting results for each model.

Figure 7 .
Figure 7.The final forecasting results and the relative error.(a) The final forecasting results of the hourly load; (b) The relative error.

Figure 8 .
Figure 8.Comparison of forecasting results for each model.

Figure 8 .
Figure 8.Comparison of forecasting results for each model.

Figure 10 .
Figure 10.The boxplot of relative error for different models.

Figure 10 .
Figure 10.The boxplot of relative error for different models.

Figure 11 .
Figure 11.The boxplot of MAPE for different models.

Figure 11 .
Figure 11.The boxplot of MAPE for different models.

Table 2 .
The calculation results of the index.

Table 3 .
The mean absolute percentage errors for different models.