Short-Term Load Forecasting Based on Wavelet Transform and Least Squares Support Vector Machine Optimized by Improved Cuckoo Search

: Due to the electricity market deregulation and integration of renewable resources, electrical load forecasting is becoming increasingly important for the Chinese government in recent years. The electric load cannot be exactly predicted only by a single model, because the short-term electric load is disturbed by several external factors, leading to the characteristics of volatility and instability. To end this, this paper proposes a hybrid model based on wavelet transform (WT) and least squares support vector machine (LSSVM), which is optimized by an improved cuckoo search (CS). To improve the accuracy of prediction, the WT is used to eliminate the high frequency components of the previous day’s load data. Additional, the Gauss disturbance is applied to the process of establishing new solutions based on CS to improve the convergence speed and search ability. Finally, the parameters of the LSSVM model are optimized by using the improved cuckoo search. According to the research outcome, the result of the implementation demonstrates that the hybrid model can be used in the short-term forecasting of the power system.


Introduction
As an important part of the management modernization of electric power systems, power load forecasting has attracted increasing attention from academics and practitioners.Power load forecasting with high precision can ease the contradiction between power supply and demand and provide a solid foundation for the stability and reliable of the power grid.However, electric load is a random non-stationary series, which is influenced by a number of factors, including economic factors, time, day, season, weather and random effects, which lead to load forecasting being a challenging subject of inquiry [1].
At present, the methods for load forecasting can be divided into two parts: classical mathematical statistical methods and approaches based on artificial intelligence.Most load forecasting theories are based on time series analysis and auto-regression models, including the vector auto-regression model (VAR) [2,3], the autoregressive moving average model (ARMA) [4][5][6], and so on.Time series smoothness prediction methods are criticized by researchers for their weakness of non-linear fitting capability.With the development of the electricity market, the requirement of high accuracy load forecasting is more and more strict and efficient.Therefore, artificial intelligence, which includes neural network and support vector machine, gains increasing attention by scholars.Nahi Kandil and Rene Wamkeue et al. [7] applied short-term load forecasting using the artificial neural network (ANN).The examples with real data showed the effectiveness of the proposed techniques by demonstrating that using ANN can reduce load forecasting errors, compared to various existing techniques.Feng Yu and Xiaozhong Xu [8] proposed an appropriate combinational approach, which was based on an improved back propagation neural network for short-term gas load forecasting, and the network was optimized by the real-coded genetic algorithm.D.K. Chaturvedi et al. [9] applied an algorithm that integrated wavelet transform, the adaptive genetic algorithm and a fuzzy system with a generalized neural network (GNN) to solve the short-term weekday electrical load problem.Luis Hernandez [10] presented an electric load forecast architectural model based on an ANN that performed short-term load forecasting.Nima Amjady and Farshid Keynia [11] proposed a neural network, which was optimized by a new modified harmony search technique.Pan Duan et al. [12] presented a new combined method for the short-term load forecasting of electric power systems based on the fuzzy c-means (FCM) clustering, particle swarm optimization (PSO) and support vector regression (SVR) techniques.Abdollah Kavousi-Fard, Haidar Samet and Fatemeh Marzbani [13] proposed a hybrid prediction algorithm comprised of SVR and modified firefly algorithm, and the experimental results affirmed that the proposed algorithm outperforms other techniques.
The support vector machine (SVM) [14] uses the structural risk minimization principle to convert the solution process into a convex quadratic programming problem.This overcomes some shortcomings in neural networks and has achieved a good performance in practical load forecasting [15].The problem of hyperplane parameter selection in SVM leads to a large solving scale.In order to solve this, J.A.K. Suykens and J. Vandewalle proposed least squares support vector machine (LSSVM) as a classifier in 1999.Unlike the inequality constraints introduced in the standard SVM, LSSVM proposed equality constraints in the formulation [16].This results in the solution being transformed from one of solving a quadratic program to a set of linear equations known as the linear Karush-Kuhn-Tucker (KKT) systems [17].Sun Wei and Liang Yi have applied the method of LSSVM in several engineering problems, including power load forecasting [18], wind speed forecasting [19], project evaluation [20] and carbon emission prediction [21].For example, in [18], a differential evolution algorithm-based least squares support vector regression method is proposed, and the average forecasting error is less than 1.6%, which shows better accuracy and stability than the traditional LSSVR and support vector regression.The kernel parameter and penalty factor highly effect the learning and generalization ability of LSSVM, and inappropriate parameter selection may lead to the limitation of the performance of LSSVM.However, it is possible to employ an optimization algorithm to obtain an appropriate parameter combination.The particle swarm optimization model [22] and genetic algorithm model [23] model are proposed in parameter optimization for LSSVM.In order to improve the forecasting accuracy of LSSVM, this paper applies the cuckoo search algorithm based on Gauss disturbance to optimize the parameters of LSSVM.Cuckoo search (CS) was proposed by Xin-She Yang and Suash Deb in 2009.CS is a population-based algorithm inspired by the brood parasitism of cuckoo species.It has a more efficient randomization property (with the use of Levy flight) and requires fewer parameters (population size and discovery probability only) than other optimization methods [24].The advantage of CS is that it does not have many parameters for tuning.Evidence showed that the generated results were independent of the value of the tuning parameters.At present, the CS has been applied in many fields, such as system reliability optimization [25], optimization of biodiesel engine performance [26], load frequency control [27], solar radiation forecasting [28], and so on.In order to improve the convergence speed and the global search ability, the CS algorithm based on Gauss disturbance (GCS) is proposed in which we add Gauss perturbation to the position of the nest during the iterative process.It can increase the vitality of the change of the nest position, thus improving the convergence speed and search ability effectively.
The wavelet transform (WT) is a recently-developed mathematical tool for signal analysis [29,30].It has been successfully applied in astronomy, data compression, signal and image processing, Energies 2016, 9, 827 3 of 17 earthquake prediction and other fields [31].The combination of WT and LSSVM is widely used in forecasting fields [32,33].For example, H. Shayeghi and A. Ghasemi [33] introduce WT and improved LSSVM to predict electricity prices.The simulation results show that this technique increases electricity price market forecasting accuracy compared to the other classical and heretical methods in scientific research.Thus, this paper proposes a hybrid model based on WT and LSSVM, which is optimized by GCS, defined as W-GCS-LSSVM, and the examples demonstrate the effectiveness of the model.
The rest of the paper is organized as follows: Section 2 provides some basic theoretical aspects of WT, LSSVM and CS and gives a brief description about the W-GCS-LSSVM model; in Section 3, an experiment study is put forward to prove the efficiency of the proposed model; Section 4 is the conclusion of this paper.

Wavelet Transform
As an effective method for signal processing, the wavelet transform can be divided into two classifications: discrete wavelet transform (DWT) and continuous wavelet transform (CWT).The CWT of a signal X (t) is defined as follows: where a and b are the scale and the translation parameters, respectively.The equation applied for the DWT of a signal is as follows: in which m is the scale factor, n = 1, 2...N is the sampling time and N is the number of samples.
As with other WTs, DWT is a kind of WT for which the wavelets are discretely sampled, and it captures both frequency and location information in temporal resolution; thus, DWT has a key advantage over Fourier transforms.In this paper, DWT is used in the data filtering stage.
In WT, a signal is similarly broken up into wavelets, which are the approximation component and detail components, in which the approximation component contains the low-frequency information (the most important part to give the signal its identity) and the detail components to reveal the flavor of the signal.Figure 1 shows a wavelet decomposition process.Firstly, the signal S is decomposed into an approximation component A1 and a detail component D1; then A1 is further decomposed into another approximation component A2 and a detail component D2 in order to meet higher level resolution; and so on, until it reaches a suitable number of levels.
Energies 2016, 9, 827 3 of 18 processing, earthquake prediction and other fields [31].The combination of WT and LSSVM is widely used in forecasting fields [32,33].For example, H. Shayeghi and A. Ghasemi [33] introduce WT and improved LSSVM to predict electricity prices.The simulation results show that this technique increases electricity price market forecasting accuracy compared to the other classical and heretical methods in scientific research.Thus, this paper proposes a hybrid model based on WT and LSSVM, which is optimized by GCS, defined as W-GCS-LSSVM, and the examples demonstrate the effectiveness of the model.The rest of the paper is organized as follows: Section 2 provides some basic theoretical aspects of WT, LSSVM and CS and gives a brief description about the W-GCS-LSSVM model; in Section 3, an experiment study is put forward to prove the efficiency of the proposed model; Section 4 is the conclusion of this paper.

Wavelet Transform
As an effective method for signal processing, the wavelet transform can be divided into two classifications: discrete wavelet transform (DWT) and continuous wavelet transform (CWT).The where a and b are the scale and the translation parameters, respectively.The equation applied for the DWT of a signal is as follows: in which m is the scale factor, n = 1, 2...N is the sampling time and N is the number of samples.
As with other WTs, DWT is a kind of WT for which the wavelets are discretely sampled, and it captures both frequency and location information in temporal resolution; thus, DWT has a key advantage over Fourier transforms.In this paper, DWT is used in the data filtering stage.
In WT, a signal is similarly broken up into wavelets, which are the approximation component and detail components, in which the approximation component contains the low-frequency information (the most important part to give the signal its identity) and the detail components to reveal the flavor of the signal.Figure 1 shows a wavelet decomposition process.Firstly, the signal S is decomposed into an approximation component A1 and a detail component D1; then A1 is further decomposed into another approximation component A2 and a detail component D2 in order to meet higher level resolution; and so on, until it reaches a suitable number of levels.The original short-term load data are proposed to be decomposed into one approximation component and multiple detail components.The main fluctuation of the short-term load data and the details to contain the spikes and stochastic volatilities on different levels are presented in approximation.
Energies 2016, 9, 827 4 of 17 A suitable number of levels can be decided by comparing the similarity between the approximation and the original signal.

Least Squares Support Vector Machine
As an extension of the standard support vector machine (SVM), the least squares support vector machine (LSSVM) is proposed by SuyKens and Vandewalle [34].By transforming the inequality constraints of traditional SVM into equality constraints, LSSVM considers the sum squares error loss function as the loss experience of the training set, which transforms solving the quadratic programming problems into solving linear equations problems [35].The training set is set as {(x k , y k )|k = 1, 2, ..., n}, in which x k ∈ R n and y k ∈ R n represent the input data and the output data, respectively.φ() is the nonlinear mapping function, which transfers the samples into a much higher dimensional feature space φ(x k ).Establish the optimal decision function in the high-dimensional feature space: where φ(x) is the mapping function; ω is the weight vector; b is constant.
Using the principle of structural risk minimization, the objective optimization function is shown as follows: Its constraint condition is: In which γ is the penalty coefficient and e k represents regression error.The Lagrange method is used to solve the optimization problem; the constrained optimization problem can be transformed into an unconstrained optimization problem; the function in the dual space can be obtained as: where the Lagrange multiplier α k ∈ R. According to the Karush-Kuhn-Tucker (KKT) conditions, ω, b, e k , α k are taken as partial derivatives and required to be zero.
According to Equation ( 7), the optimization problem can be transformed into solving a linear problem, which is shown as follows: Solve Equation ( 8) to get α and b, then the LSSVM optimal linear regression function is: Energies 2016, 9, 827 5 of 17 According to the Mercer condition, K(x, x i ) = φ(x) T • φ(x l ) is the kernel function.In this paper, set the radial basis function (RBF) as the kernel function, which is shown in Equation (10): In which σ 2 is the width of the kernel function.
From the problems of training the LSSVM, kernel parameter σ 2 and penalty parameter γ are generally set based on experience, which leads to the existence of randomness and inaccuracy in the application of the LSSVM algorithm.To solve the problem, the paper applies GCS to optimize these two parameters to improve the prediction accuracy of LSSVM.

Cuckoo Search
The cuckoo search (CS) algorithm is a new optimization metaheuristic algorithm [24], which is on the basis of the stochastic global search and the obligate brood-parasitic behavior of cuckoos by laying their eggs in the nests of host birds.In this optimization algorithm, each nest represents a potential solution.The cuckoo birds choose recently-spawned nests, so that they can be sure that eggs could hatch first because a cuckoo egg usually hatches earlier than its host bird.In addition, by mimicking the host chicks, a cuckoo chick can deceive the host bird to grab more food resources.If the host birds discover that an alien cuckoo egg has been laid (with the probability p a ), they either propel the egg or abandon the nest and completely build a new nest in a new location.New eggs (solutions) laid by the cuckoo choose the nest by Levy flights around the current best solutions.Additionally, with the Levy flight behavior, the cuckoo speeds up the local search efficiency.
Yang and Deb simplified the cuckoo parasitic breeding process by the following three idealized rules [24]: (i) Each cuckoo lays only one egg at a time and randomly searches for a nest in which to lay it.
(ii) An egg of high quality will be considered to survive to the next generation.
(iii) The number of available host nests is fixed, and a host can discover an alien egg with a probability p a ∈ [0, 1].In this case, the host bird can either throw the egg away or abandon the nest so as to build a completely new nest in a new location.The last strategy is approximated by a fraction p a of the n nests being replaced by new nests (with new random solutions at new locations).
In sum, two search capabilities have been used in cuckoo search: global search (diversification) and local search (intensification), controlled by a switching/discovery probability (p a ).Local search can be described as follows: where x t j and x t k are different random sequences; H(u) is the Hedwig-Cede function; ε represents a random number; s means the step lengths.The global search is based on Levy flight, which is shown as follows: x where L(s, λ) = λΓ(λ)sin(πλ/2) π 1 s 1+λ , s >> s 0 , 1 < λ ≤ 3; α is the levy flight step size multiplication processes with an entry-wise multiplication process.The product ⊕ means entry-wise multiplications, which is similar to those used in PSO, but the random walk process via Levy flight here is more efficient in exploring the search space, for its step length is much longer in the long run.It is worth pointing out that, in the real world, if a cuckoo's egg is very similar to a host's eggs, then this cuckoo's egg is less likely to be discovered; thus, the fitness should be related to the difference in solutions.Therefore, it is a good idea to do a random walk in a biased way with some random step size location [36].
The pseudo-code for the CS is performed in Figure 2: Energies 2016, 9, 827 6 of 17 multiplication processes with an entry-wise multiplication process.The product  means entry- wise multiplications, which is similar to those used in PSO, but the random walk process via Levy flight here is more efficient in exploring the search space, for its step length is much longer in the long run.It is worth pointing out that, in the real world, if a cuckoo's egg is very similar to a host's eggs, then this cuckoo's egg is less likely to be discovered; thus, the fitness should be related to the difference in solutions.Therefore, it is a good idea to do a random walk in a biased way with some random step size location [36].
The pseudo-code for the CS is performed in Figure 2:

CS Algorithm Based on Gauss Disturbance
On the basis of the cuckoo search algorithm (CS), the CS algorithm based on the Gauss disturbance (GCS) is proposed in which we add Gauss perturbation to the position of the nest during the iterative process.It can increase the vitality of the change of the nest position, thus improving the convergence speed and search ability effectively.
The basic idea of the cuckoo search algorithm (CS) based on Gauss perturbation is: continue to conduct the Gauss perturbation of x to make a further search instead of coming directly into the next iteration when a better set of nest locations

CS Algorithm Based on Gauss Disturbance
On the basis of the cuckoo search algorithm (CS), the CS algorithm based on the Gauss disturbance (GCS) is proposed in which we add Gauss perturbation to the position of the nest during the iterative process.It can increase the vitality of the change of the nest position, thus improving the convergence speed and search ability effectively.
The basic idea of the cuckoo search algorithm (CS) based on Gauss perturbation is: continue to conduct the Gauss perturbation of x t i to make a further search instead of coming directly into the next iteration when a better set of nest locations x t i , i = 1, 2, ..., n is gained after t iterations in CS.Suppose x t i , i = 1, 2, ..., n is a d-dimensional vector and p t is described as T , then p t is a d × n matrix.The specific operation of GCS algorithm is adding Gauss perturbation to p t , namely: where ε is a random matrix of the same order to p t , ε ij ∼ N(0, 1); a is a constant; ⊕ represents the point-to-point multiplication.The large range of the value of ε easily leads to the large deviation of the nest location.Therefore, we select a = 1/3 to control the search scope of ε, thus moderately increasing the vitality of the change of the nest position to make p t reasonable.Then, compare it with each nest in p t and update p t with a better set of nest positions.For the next iteration, p t can be represented as T .

LSSVM Optimized by the CS Algorithm Based on Gauss Disturbance
The flowchart of the W-GCS-LSSVM model is shown in Figure 3, and the detailed processes are as follows: (1) Decompose the load signal into the approximation A1 and the details D1, and select A1 as the training data and testing data.Normalize the load data.
(2) Determine the value range of σ 2 and γ of LSSVM and related parameters of GCS.In this paper, the number of host nests is 25; the maximum number of iterations is 400; and the search range is between 0.01 and 100.
(3) Suppose the initial probability parameter p a is 0.25, and set T as the location of a random n nest.Each nest corresponds to a set of parameters (σ 2 , γ).Then, calculate the fitting degree of each nest position to find the best nest location x 0 b and the minimum fitting degree F min .The root mean square error (RMSE) is applied as the fitness function: Energies 2016, 9, 827 7 of 17 (4) Reserve the best nest position x 0 b and update other nest positions through Levy flights to obtain a new set of nest positions, then calculate the fitting degree F.
(5) Compare the new nest positions with the preceding generation p i−1 according to the fitting degree F and update the nest position with a better one; thus, the new set nest position is described as follows: (6) Compare the p a to a random number r. Reserve the nests with lower probability to be discovered in p t and replace the higher one.Then, calculate the fitting degree of the new nests and update the nest position p t by comparing it with the precedent fitness degree. (

Data Preprocessing
This paper establishes a prediction model of short-term load forecasting and analyzes the prediction results of the examples.The 24-h short-term load forecasting has been made on the power system of Yangquan city in China from 1 April to 30 May 2013 (the load data of 23 May are missing).Figure 4 shows the power load of 1416 samples, ranging from around 730 MW to 950 MW.From Figure 4, no apparent regularity of power load can be obtained.In this paper, we select 708 load data from 1 to 30 April as the training set, 660 load data from 30 April to 28 May as the validation set and 72 load data from 29 to 31 May as the testing set.The original load data are decomposed to eliminate the current precipitation value for further modeling by using WT.The original short-term load data S and their approximation A1, as well as the detail component D1 decomposed by one-level DWT are shown in Figure 5. From Figure 5, it can be clearly seen that A1, which presents the major fluctuation of the original short-term load data, shows high similarity to S; meanwhile, the other minor irregularity neglected by A1 appears in D1.Therefore, A1 is taken as the input data to model for efficiency.

Selection of Input
Human activities are always disturbed by many external factors, and then, the power load is affected.Therefore, some effective features are considered as input features.In this paper, the input features are discussed as follows.(1) The temperature: Temperature is one of these effective features.In previous studies [37,38], temperature was considered as an essential input feature and the The original load data are decomposed to eliminate the current precipitation value for further modeling by using WT.The original short-term load data S and their approximation A1, as well as the component D1 decomposed by one-level DWT are shown in Figure 5.
Energies 2016, 9, 827 9 of 18 Figure 4 shows the power load of 1416 samples, ranging from around 730 MW to 950 MW.From Figure 4, no apparent regularity of power load can be obtained.In this paper, we select 708 load data from 1 to 30 April as the training set, 660 load data from 30 April to 28 May as the validation set and 72 load data from 29 to 31 May as the testing set.The original load data are decomposed to eliminate the current precipitation value for further modeling by using WT.The original short-term load data S and their approximation A1, as well as the detail component D1 decomposed by one-level DWT are shown in Figure 5. From Figure 5, it can be clearly seen that A1, which presents the major fluctuation of the original short-term load data, shows high similarity to S; meanwhile, the other minor irregularity neglected by A1 appears in D1.Therefore, A1 is taken as the input data to model for efficiency.

Selection of Input
Human activities are always disturbed by many external factors, and then, the power load is affected.Therefore, some effective features are considered as input features.In this paper, the input features are discussed as follows.(1) The temperature: Temperature is one of these effective features.In previous studies [37,38], temperature was considered as an essential input feature and the From Figure 5, it can be clearly seen that A1, which presents the major fluctuation of the original short-term load data, shows high similarity to S; meanwhile, the other minor irregularity neglected by A1 appears in D1.Therefore, A1 is taken as the input data to model for efficiency.

Selection of Input
Human activities are always disturbed by many external factors, and then, the power load is affected.Therefore, some effective features are considered as input features.In this paper, the input features are discussed as follows.(1) The temperature: Temperature is one of these effective features.In previous studies [37,38], temperature was considered as an essential input feature and the forecasting results were accurate enough.The curves of temperature and load data are shown in Figure 6.Therefore, the temperature is taken into consideration.(2) Weather conditions: The weather conditions are divided into four types: sunny, cloudy, overcast and rainy.For different weather conditions, we set different weights: {sunny, cloudy, overcast, rainy} = {0.8,0.6, 0.4, 0.2}.(3) Day type: For different day types, the electric power consumption is different.Figure 7 shows the load data from 28 April to 4 May 2013.From Figure 7, we can see that different day types have different curve features.Therefore, we values to the day type in Table 1.For different day types, the electric power consumption is different.Figure 7 shows the load data from 28 April to 4 May 2013.From Figure 7, we can see that different day types have different curve features.Therefore, we assign values to the day type in Table 1.

Model Performance Evaluation
The work in [39] discusses and compares the measures of the accuracy of univariate time series forecasts.According to this reference, the relative error (RE), the mean absolute percentage error For different day types, the electric power consumption is different.Figure 7 shows the load data from 28 April to 4 May 2013.From Figure 7, we can see that different day types have different curve features.Therefore, we assign values to the day type in Table 1.

Model Performance Evaluation
The work in [39] discusses and compares the measures of the accuracy of univariate time series forecasts.According to this reference, the relative error (RE), the mean absolute percentage error  Energies 2016, 9, 827 10 of 17

Model Performance Evaluation
The work in [39] discusses and compares the measures of the accuracy of univariate time series forecasts.According to this reference, the relative error (RE), the mean absolute percentage error (MAPE), the root mean square error (RMSE) and absolute error (AE) are proposed to measure the forecast accuracy.The equations are as follows: 17) where y i represents the actual value at period i; ŷi is the forecasting value at period i; and n is the number of forecasting period.

Analysis of Forecasting Results
At first, the GCS is used to optimize the kernel parameter σ 2 and penalty parameter γ in LSSVM.The parameter settings of GCS is given in Section 2.4. Figure 8 shows the iterations process of GCS.From the figure we can see that GCS achieves convergence at 263 times.The optimal values of σ 2 and γ are respectively 6.41 and 16.24.
Energies 2016, 9, 827 11 of 18 (MAPE), the root mean square error (RMSE) and absolute error (AE) are proposed to measure the forecast accuracy.The equations are as follows: (18) where i y represents the actual value at period i ; i y ˆ is the forecasting value at period i ; and n is the number of forecasting period.

Analysis of Forecasting Results
At first, the GCS is used to optimize the kernel parameter   The short-term electric load forecasting results of three days of the W-GCS-LSSVM, GCS-LSSVM, CS-LSSVM, W-LSSVM (σ 2 = 5 and γ = 10) and LSSVM (σ 2 = 5 and γ = 10) model are respectively shown in Tables 2-4.In order to explain the results more clearly, the proposed model and comparison models are divided into two groups: the first group includes W-GCS-LSSVM, GCS-LSSVM and CS-LSSVM, and the second group consists of W-GCS-LSSVM, W-LSSVM and LSSVM, which are respectively shown in Figures 9 and 10.Moreover, Figures 11 and 12 show the comparisons of relative errors between the proposed model and the others.The RE ranges [−3%, 3%] and [−1%, 1%] are popularly regarded as a standard to evaluate the performance of a prediction model [40].Based on Energies 2016, 9, 827 these tables and figures, we can obtain that: (1) the REs of the short-term load forecasting model of W-GCS-LSSVM are all in the range of [−3%, 3%]; the maximum RE is 2.4380% at 15:00 (Day 1), and the minimum RE is −2.901% at 13:00 (Day 1); there exists thirty points that are in the scope of [−1%, 1%]; (2) the GCS-LSSVM has three predicted points that exceed the RE range [−3%, 3%], which are 4.5000% at 19:00 (Day 3), 3.6472% at 8:00 (Day 3) and 3.1826% at 16:00 (Day 1), and there are twenty-nine predicted points in the range of [−1%, 1%]; (3) the CS-LSSVM has four predicted points that exceed the RE range [−3%, 3%], which are 5.0824% at 19:00 (Day 3), 3.1026% at 22:00 (Day 2), 3.0863% at 16:00 (Day 1) and −3.0154% at 17:00 (Day 2), and there are twenty-one predicted points in the range of [−1%, 1%]; (4) the W-LSSVM has four predicted points that exceed the RE range [−3%, 3%], which are respectively 3.4763% at 17:00 (Day 1), 3.2786% at 0:00 (Day 2), 3.0215% at 11:00 (Day 2) and −3.3465% at 13:00 (Day 1), and are seventeen predicted points in the range of [−1%, 1%]; (5) the single LSSVM has fourteen predicted points that exceed the RE range [−3%, 3%], which are respectively 4.1518% at 8:00 (Day 1), 3.8082% at 23:00 (Day 1), 3.4807% at 12:00 (Day 3), 3.4028% at 23:00 (Day 2), 3.3572% at 16:00 (Day 3), 3.3091% at 17:00 (Day 1), 3.2287% at 17:00 (Day 3), 3.1997% at 7:00 (Day 1), 3.1958% at 15:00 (Day 3), 3.0350% at 13:00 (Day 3), −3.1991% at 5:00 (Day 1), −3.2325% at 3:00 (Day 2) and −3.5397% at 14:00 (Day 1), and there are fifteen predicted points in the range of [−1%, 1%].From the global view of RE, the forecasting accuracy of W-GCS-LSSVM is better than the other models, since it has the most predicted points in the ranges [−1%, 1%] and [−3%, 3%].Moreover, from Figure 9, the results of GCS-LSSVM are better than those of CS-LSSVM, which can verify that the Gauss disturbance strategy applied in CS increases the vitality of the change of the nest position, thus improving the convergence speed and search ability effectively.From Figure 10, the effects of W-LSSVM are better than single LSSVM, which can illustrate that WT effectively filters the original data.However, the comparison models also predict more accurately than the proposed model at some points, for example the RE of W-GCS-LSSVM is 2.901% at 13:00 (Day 1), which is higher than that of GCS-LSSVM, CS-LSSVM and LSSVM.The MAPE and MSE of WT-GCS-LSSVM, GCS-LSSVM, CS-LSSVM, W-LSSVM and LSSVM are listed in Table 5.From Table 5, we can conclude that the MAPE of the proposed model is 1.2083%, which is smaller than the MAPE of GCS-LSSVM, CS-LSSVM, W-LSSVM and LSSVM (which are 1.3682%, 1.4790%, 1.4213% and 1.9557%).In addition, the MSE of the proposed model is 131.6950, which is smaller than the MSE of the comparison models (which are 185.6538,210.7736, 196.6906 and 336.5224).As a result, the MAPE and MSE of the W-GCS-LSSVM are both smaller than those of the W-LSSVM, so we can conclude that the parameter optimization to LSSVM is essential in the forecasting model.Besides, the MAPE and MSE of the WT-GCS-LSSVM are both smaller than GCS-LSSVM, indicating the pre-processing of load data is useful for a better performance and higher forecasting accuracy.At the same time, the MAPE and MSE of the GCS-LSSVM and CS-LSSVM are both smaller than those of LSSVM, and this presents that the optimization results of the GCS and CS are efficient.In addition, the AE of the load forecasting value divided into four parts that is calculated from Equation ( 18) is shown in Figure 13.The numbers on the x-axis represent the models appeared above: 1 represents the W-GCS-LSSVM model, 2 represents the GCS-LSSVM model, 3 represents the CS-LSSVM model, 4 represents the W-LSSVM model and 5 represents the single LSSVM model.From Figure 13, we can discover that the AE values of W-GCS-LSSVM are almost lower than those of the other models.The numbers of points that are less than 1%, 3% and more than 3% and the corresponding percentage of them in the predicted points are accounted, respectively.The statistical results are shown in Table 6.It can be seen that there are 30 predicted points whose the AE of the W-GCS-LSSVM model is less than 1%, which accounts for 41.67% of the total amount; and 42 predicted points less than 3%, accounting for 58.33% of the total amount.Besides, there are no number predicted points whose AE is more than 3%, accounting for 0% of the total amount.It can be indicated that the prediction performance of the proposed model is superior, and its accuracy is higher.Therefore, the W-GCS-LSSVM model is suitable for short-term load forecasting.The MAPE and MSE of WT-GCS-LSSVM, GCS-LSSVM, CS-LSSVM, W-LSSVM and LSSVM are listed in Table 5.From Table 5, we can conclude that the MAPE of the proposed model is 1.2083%, which is smaller than the MAPE of GCS-LSSVM, CS-LSSVM, W-LSSVM and LSSVM (which are 1.3682%, 1.4790%, 1.4213% and 1.9557%).In addition, the MSE of the proposed model is 131.6950, which is smaller than the MSE of the comparison models (which are 185.6538,210.7736, 196.6906 and 336.5224).As a result, the MAPE and MSE of the W-GCS-LSSVM are both smaller than those of the W-LSSVM, so we can conclude that the parameter optimization to LSSVM is essential in the forecasting model.Besides, the MAPE and MSE of the WT-GCS-LSSVM are both smaller than GCS-LSSVM, indicating the pre-processing of load data is useful for a better performance and higher forecasting accuracy.At the same time, the MAPE and MSE of the GCS-LSSVM and CS-LSSVM are both smaller than those of LSSVM, and this presents that the optimization results of the GCS and CS are efficient.In addition, the AE of the load forecasting value divided into four parts that is calculated from Equation ( 18) is shown in Figure 13.The numbers on the x-axis represent the models appeared above: 1 represents the W-GCS-LSSVM model, 2 represents the GCS-LSSVM model, 3 represents the CS-LSSVM model, 4 represents the W-LSSVM model and 5 represents the single LSSVM model.From Figure 13, we can discover that the AE values of W-GCS-LSSVM are almost lower than those of the other models.The numbers of points that are less than 1%, 3% and more than 3% and the corresponding percentage of them in the predicted points are accounted, respectively.The statistical results are shown in Table 6.It can be seen that there are 30 predicted points whose the AE of the W-GCS-LSSVM model is less than 1%, which accounts for 41.67% of the total amount; and 42 predicted points less than 3%, accounting for 58.33% of the total amount.Besides, there are no number predicted points whose AE is more than 3%, accounting for 0% of the total amount.It can be indicated that the prediction performance of the proposed model is superior, and its accuracy is higher.Therefore, the W-GCS-LSSVM model is suitable for short-term load forecasting.

Conclusions
To strengthen the stability and economy of the grid and avoid waste in grid scheduling, it is essential to improve the forecasting accuracy.Because the short-term power load is always interfered with by various external factors with characteristics like high volatility and instability, the high accuracy of load forecasting should be taken into consideration.Based on the features of load data and the randomness of the LSSVM parameter setting, we propose the model based on wavelet transform and least squares support vector machine optimized by improved cuckoo search.To validate the proposed model, four other comparison models (GCS-LSSVM, CS-LSSVM, W-LSSVM and LSSVM) are employed to compare the forecasting results.Example computation results show that the relative errors of the W-GCS-LSSVM model are all in the range of [−3%, 3%], and the MAPE and MSE are both smaller than the others.In addition, the advantage of CS is that it does not have many parameters for tuning, so it can be applied widely in parameter optimization.Above all, the hybrid model can be effectively used in the short-term load forecasting on the power system. of this method, and authors will study it in the future.Above all, the hybrid model can be effectively used in the short-term load forecasting on the power system.

Figure 2 .
Figure 2. The pseudo-code of the cuckoo search (CS).

Figure 2 .
Figure 2. The pseudo-code of the cuckoo search (CS).

8 ) 18 ( 8 ) 2 
Find the best nest position x t b in Step(7).If the fitting degree F meets the requirements, stop the algorithm, and then, output the global minimum fitting degree F min , as well as the best nest x t b .If not, return to Step (4) to continue optimization.(9)Set the optimal parameters σ 2 and γ of LSSVM according to the best nest position x t b .Energies 2016, 9, 827 8 of Find the best nest position t bx in Step(7).If the fitting degree F meets the requirements, stop the algorithm, and then, output the global minimum fitting degree min F , as well as the best nest t bx .If not, return to Step (4) to continue optimization.(9)Set the optimal parameters and  of LSSVM according to the best nest position t bx .

Figure 3 .Figure 3 .
Figure 3. Flowchart of the W-CS algorithm based on Gauss disturbance (GCS)-least squares support vector machine (LSSVM) modeling.3.Case Study3.1.Data PreprocessingThis paper establishes a prediction model of short-term load forecasting and analyzes the prediction results of the examples.The 24-h short-term load forecasting has been made on the power

Figure 4
Figure4shows the power load of 1416 samples, ranging from around 730 MW to 950 MW.From Figure4, no apparent regularity of power load can be obtained.In this paper, we select 708 load data from 1 to 30 April as the training set, 660 load data from 30 April to 28 May as the validation set and 72 load data from 29 to 31 May as the testing set.

Figure 4 .
Figure 4. Load curve for each hour.

Figure 5 .
Figure 5. Original load signal and its approximation component and detail component decomposed by DWT.

Figure 4 .
Figure 4. Load curve for each hour.

Figure 4 .
Figure 4. Load curve for each hour.

Figure 5 .
Figure 5. Original load signal and its approximation component and detail component decomposed by DWT.

Figure 5 .
Figure 5. Original load signal and its approximation component and detail component decomposed by DWT.

Energies 2016, 9 , 827 10 of 18 Figure 6 .
Figure 6.Therefore, the temperature is taken into consideration.(2) Weather conditions: The weather conditions are divided into four types: sunny, cloudy, overcast and rainy.For different weather conditions, we set different weights: {sunny, cloudy, overcast, rainy} = {0.8,0.6, 0.4, 0.2}.(3) Day type:For different day types, the electric power consumption is different.Figure7shows the load data from 28 April to 4 May 2013.From Figure7, we can see that different day types have different curve features.Therefore, we assign values to the day type in Table1.

Figure 6 .
Figure 6.The curves of the load data and temperature.

Figure 6 . 18 Figure 6 .
Figure 6.The curves of the load data and temperature.

Figure 6 .
Figure 6.The curves of the load data and temperature.

2  2 
and penalty parameter  in LSSVM.The parameter settings of GCS is given in Section 2.4.Figure8shows the iterations process of GCS.From the figure we can see that GCS achieves convergence at 263 times.The optimal values of and  are respectively 6.41 and 16.24.

Figure 8 .
Figure 8.The iterations process of GCS.

Figure 13 .
Figure 13.Absolute error distribution curve for different models, (a) the AE value from 1 to 18 sample point; (b) the AE value from 19 to 36 sample point; (c) the AE value from 37 to 54 sample point; (d) the AE value from 55 to 72 sample point.
Obtain a new set of nest positions p t = [x t 1 , x t 2 , ..., x t n ] T through Gaussian perturbation of p t .Then, compare the test value of p t with p t .Update the nest positions with better test values as p t = [x t 1 , x t 2 , ..., x t n ] T .Here, p t is denoted by p t = [x 2 , ..., x

Table 1 .
The values of the day type.

Table 1 .
The values of the day type.

Table 1 .
The values of the day type.

Table 2 .
Actual load and forecasting results in Day 1 (Unit: MV).

Table 3 .
Actual load and forecasting results in Day 2 (Unit: MV).

Table 4 .
Actual load and forecasting results in Day 3 (Unit: MV).

Table 6 .
Accuracy estimation of the prediction point for the test set.