A Multi Time Scale Wind Power Forecasting Model of a Chaotic Echo State Network Based on a Hybrid Algorithm of Particle Swarm Optimization and Tabu Search

The uncertainty and regularity of wind power generation are caused by wind resources’ intermittent and randomness. Such volatility brings severe challenges to the wind power grid. The requirements for ultrashort-term and short-term wind power forecasting with high prediction accuracy of the model used, have great significance for reducing the phenomenon of abandoned wind power , optimizing the conventional power generation plan, adjusting the maintenance schedule and developing real-time monitoring systems. Therefore, accurate forecasting of wind power generation is important in electric load forecasting. The echo state network (ESN) is a new recurrent neural network composed of input, hidden layer and output layers. It can approximate well the nonlinear system and achieves great results in nonlinear chaotic time series forecasting. Besides, the ESN is simpler and less computationally demanding than the traditional neural network training, which provides more accurate training results. Aiming at addressing the disadvantages of standard ESN, this paper has made some improvements. Combined with the complementary advantages of particle swarm optimization and tabu search, the generalization of ESN is improved. To verify the validity and applicability of this method, case studies of multitime scale forecasting of wind power output are carried out to reconstruct the chaotic time series of the actual wind power generation data in a certain region to predict wind power generation. Meanwhile, the influence of seasonal factors on wind power is taken into consideration. Compared with the classical ESN and the conventional Back Propagation (BP) neural network, the results verify the superiority of the proposed method.


Introduction
The energy crisis is becoming more and more obvious with the continuous increase in energy consumption, overexploitation of traditional energy and exhaustion of fossil fuels.In order to ease the crisis, countries around the world are increasing the utilization of new energy sources and realizing the sustainable development of energy.As an important category of renewable energy, wind has the characteristics of huge reserves, renewability, wide distribution and non-polluting nature [1].In recent years, as an important aspect of renewable energy utilization and the most mature new energy use technology, the penetration of wind energy in the power grid is increasing rapidly [2].China has abundant wind energy resources.Besides, the government has also attached great importance to the development of wind power industry.China's wind power has entered a stage of rapid development since 2003.China's wind power installed capacity doubled for four consecutive years from 2006 to 2009.The new installed capacity of wind power in China exceeds 16 GW and the total installed capacity reached 41.83 GW in 2010, which surpassed the United States for the first time, ranking first in the World [3].In 2012, China's total wind power installed capacity is still in the first place, and wind energy growth momentum in the United States is also relatively strong, with a new grid installed capacity is 13.1 GW; in Europe, Germany's wind power development is the best, followed by Spain and the United Kingdom.By the end of 2012, the national rankings of the top ten countries according to their total installed capacity of wind power was as shown in Figure 1.
sustainable development of energy.As an important category of renewable energy, wind has the characteristics of huge reserves, renewability, wide distribution and non-polluting nature [1].In recent years, as an important aspect of renewable energy utilization and the most mature new energy use technology, the penetration of wind energy in the power grid is increasing rapidly [2].
China has abundant wind energy resources.Besides, the government has also attached great importance to the development of wind power industry.China's wind power has entered a stage of rapid development since 2003.China's wind power installed capacity doubled for four consecutive years from 2006 to 2009.The new installed capacity of wind power in China exceeds 16 GW and the total installed capacity reached 41.83 GW in 2010, which surpassed the United States for the first time, ranking first in the World [3].In 2012, China's total wind power installed capacity is still in the first place, and wind energy growth momentum in the United States is also relatively strong, with a new grid installed capacity is 13.1 GW; in Europe, Germany's wind power development is the best, followed by Spain and the United Kingdom.By the end of 2012, the national rankings of the top ten countries according to their total installed capacity of wind power was as shown in Figure 1.Against the background of global economic development, the global wind power industry is growing rapidly.However, the intermittent nature and randomness of wind power determines the strong volatility of wind power.With the increasing number of wind farms and the installed capacity, the wind power volatility represents a huge challenge to economic security and increases the difficulty of the Total installed capacity of wind power(GW) Against the background of global economic development, the global wind power industry is growing rapidly.However, the intermittent nature and randomness of wind power determines the strong volatility of wind power.With the increasing number of wind farms and the installed capacity, the wind power volatility represents a huge challenge to economic security and increases the difficulty of the operation and management of power network dispatching companies once incorporated into the power grid.Accurate prediction of the wind power in advance can relieve the pressure of peaking power systems, which can effectively improve the ability to include wind power in the grid [4].
Many experts and scholars both at home and abroad have performed lots of theoretical studies and practical simulations.From the time scale point of view, wind power forecasting can be divided into long-term forecasting, short-term and ultrashort-term forecasting.Long-term forecasts take "week", "month", "year" as the units, and are mainly used for the wind farm design feasibility studies and the maintenance and commissioning of wind turbines.Different from the European wind power distributed access, most of China's wind power is connected through a centralized grid connection, so any power fluctuation has a greater impact on the power grid scheduling.The ultrashort-term prediction has been widely used in the optimization of spare capacity [5,6] and the economic load dispatch and control [7].Short-term forecasting can reduce the abandoned wind and adjust the maintenance plans [3].Short-term and ultrashort-term forecasting which take "hour", "minute" as the units are particularly important.
Depending on the research method, the short-term wind power forecasting method can be roughly divided into physical, statistical and learning methods [3], seen in Figure 2.
Energies 2015, 8, 12388-12408 dispatch and control [7].Short-term forecasting can reduce the abandoned wind and adjust the maintenance plans [3].Short-term and ultrashort-term forecasting which take "hour", "minute" as the units are particularly important.
Depending on the research method, the short-term wind power forecasting method can be roughly divided into physical, statistical and learning methods [3].Physical methods are based on numerical weather forecast models (Numerical Weather Prediction-NWP).The wind turbine, wind speed, direction and other information about the hub height of the wind turbine are obtained, combined with physical information, and the power curve of the wind turbine is used to simulate the actual output power [8].This method has no historical data support and is suitable for forecasting a new wind farm, but it has high accuracy requirements and data collection is difficult with high calculation demands and implementation costs.
Statistical methods establish a forecast model by finding the relationships between historical data and wind speed or power without considering the physical process of wind speed change.This relationship can be expressed in function form, such as Markov chains [4], regression analysis [9,10], exponential smoothing method [11], Kalman filtering method [12], ARMA model [13] and so on.Among them, the ARMA (p, q) model, used as the common statistical model, has high accuracy of analysis and prediction for stationary time series, however, due to the impact of the uncertain natural climate, wind energy has obvious trends, diversity and periodicity, which show non-stationary time series.Statistical models thus have great limitations.Compared with physical models, statistical models are relatively simple and Physical methods are based on numerical weather forecast models (Numerical Weather Prediction-NWP).The wind turbine, wind speed, direction and other information about the hub height of the wind turbine are obtained, combined with physical information, and the power curve of the wind turbine is used to simulate the actual output power [8].This method has no historical data support and is suitable for forecasting a new wind farm, but it has high accuracy requirements and data collection is difficult with high calculation demands and implementation costs.
Statistical methods establish a forecast model by finding the relationships between historical data and wind speed or power without considering the physical process of wind speed change.This relationship can be expressed in function form, such as Markov chains [4], regression analysis [9,10], exponential smoothing method [11], Kalman filtering method [12], ARMA model [13] and so on.Among them, the ARMA (p, q) model, used as the common statistical model, has high accuracy of analysis and prediction for stationary time series, however, due to the impact of the uncertain natural climate, wind energy has obvious trends, diversity and periodicity, which show non-stationary time series.Statistical models thus have great limitations.Compared with physical models, statistical models are relatively simple and have strong applicability, but they need a large amount of data support, which is not suitable for the prediction of new wind farms.
Learning methods extract the relationship between the input and output using artificial intelligence methods.The models are usually nonlinear and cannot be expressed by an analytical expression.Commonly used models are neural network [14,15], wavelet analysis [16], support vector machine [17,18], etc.. Learning methods have strong adaptive ability, and the accuracy of the models can be improved by error correction.Wind power generation often shows irregular and uncertain behaviors.Thus, Bayesian theory is applied to wind power forecasting.Probability forecasting methods can give the probability of all the wind power generation in the next moment, providing more comprehensive forecast information, but they require a great quantity of data.The analysis and calculation process is more complex, and some data depends on the prior probability [19].
Many studies have judged the chaotic characteristics before prediction [20][21][22].Literatures [20,21] verified that wind power output and wind speed time series have chaotic properties.According to the theory of nonlinear dynamics, chaotic time series in the short term are predictable, indicating that the wind power prediction is conducted with the chaos method.The traditional prediction methods for chaotic time series include the global method and local method.The global predictive method takes all state points into account and makes them the fitted objects to find out the rules to fit the prediction model.This method needs more historical data and a large amount of computation.It is difficult to calculate the phase space with high embedding dimensions, which limits its application.In contrast, the local method is more widely applied.Literature [21] proposes an improved weighted zero order local area prediction method, using the correlation degree of the phase point to find the near phase point, which improves the forecasting precision.A multivariable local prediction method of wind speed is proposed in [22].Based on the principle 12390 Energies 2015, 8, 12388-12408 of phase relation, the multivariable time series data are screened to construct the multivariable phase space.Neural networks have been widely used in the prediction of chaotic sequences in the last few years [23][24][25][26][27][28].Back Propagation (BP) neural network [23,24], radial basis function neural network [25], Elman neural network [26], GMDH type neural network [27] and wavelet neural network [28] have all been used to forecast chaotic sequences and have achieved different prediction effects in different areas.With its good nonlinear forecasting ability, the recurrent neural network has been the main tool for the prediction of chaotic time series, and has been widely recognized by the academic community [29].However, the mathematical analysis is difficult in practical use, for it has the disadvantages of being time-consuming, with large computational demands and low training efficiency.As a new recurrent neural network, echo state network (ESN) has greatly improved stability, global optimality and training process complexity phase compared with the traditional neural network [30], attracting wide attention of scholars at home and abroad.It has been widely used in speech recognition, traffic control [26] and communication forecasting [31], but is rarely used in power system forecasting.In addition, the classical ESN has limited expression ability for nonlinear dynamic systems, and it is difficult to meet the requirement of precision of various forecasting problems in real situations, so its generalization ability is not strong.
In view of this, this paper presents a short-term wind power intelligent forecasting model for dealing with the characteristics of nonlinear power generation.Considering the uncertainties of wind power generation, the time series chaotic sequence is judged.The delay time and the embedding dimension are computed to reconstruct phase space; then, to overcome the shortcomings of the ESN network itself, this paper designs and trains an ESN to forecast short-term wind power by combining the two complementary optimization methods of particle swarm optimization and tabu search to improve the generalization ability of the model.The intelligent forecasting model proposed in this paper has the following innovations: (1) Firstly, the phase space reconstruction of chaotic time series is used to reduce the impact of the uncertainty of the original data; (2) Secondly, the ESN is introduced for prediction.As a new dynamic recurrent neural network, ESN has a unique simple training method and high precision training results.The introduction of reserve pool has overcome the problems of slow convergence and local minimum, which can bring about higher accuracy; (3) Considering the characteristics of the tabu search and particle swarm algorithm, the diversity principles of the tabu search algorithm are introduced into the mechanisms of "group share" and "random search" of particle swarm to make the search process have memory.The combination of the two algorithms provides complementary advantages to improve the prediction accuracy; (4) Considering the multitime scales of minutes, hours and days simultaneously, wind power predictions are carried out.Meanwhile, the seasonal factors are integrated into the model to analyze the characteristics of wind power, and the robustness and generalization ability of the proposed method is verified.
In the empirical analysis, compared with the classical ESN, the traditional BP network and the single method optimizing ESN, the proposed method achieves higher prediction accuracy.The main contents of this paper are as follows: Section 1 reviews and summarizes the wind power forecasting methods, and proposes the research methods and contents of this paper; Section 2 outlines the basic theory of the chaotic ESN and describes the structure and principle of the ESN; Section 3 introduces the optimization method of particle swarm optimization and tabu search and explains the principle and process of the combination optimization method in detail; Section 4 establishes the prediction process of the chaotic ESN based on particle swarm optimization and tabu search algorithms; Section 5 reports our experiments and relevant discussion; Concluding remarks are presented in Section 6.

Chaos Identification and Phase Space Reconstruction
Chaotic time series present a complex process which is non-linear, dynamic and open, and it is difficult for traditional methods to improve the prediction accuracy.However, its inherent nonlinear dynamics structure makes it meet short-term predictability.The inherent regularity of a nonlinear mapping sequence can be found by reconstructing the phase space to improve the accuracy of prediction.
According to Takens' embedding theorem, for a time series, as long as m ě 2d `1 (m is the embedding dimension, d is the correlation dimension of power system), the attractor in the m-dimensional reconstructed space can be recovered and the phase trajectory can be reconstructed with a differential rail with original power systems.The reconstructed space is topologically equivalent to the original power systems [26,31].
For wind power generation time series: x 1 , x 2 , x 3 , ¨¨¨x n if the embedding dimension m and the delay time τ can be selected properly, the reconstructed phase space can be obtained: where, N " n ´pm ´1qτ is the length of the vector sequence.
The key of chaotic time series forecasting of wind power generation is phase space reconstruction, and the key of phase space reconstruction is the value of the embedding dimension m and time delay τ.Although Takens proposed and proved the embedding theorem, he didn't give a certain method to choose the parameters of phase space reconstruction.If m is too small, it cannot display the real structure of a complex system; while, if m is too large, it will make the true structure of the points unclear due to the reduction of the density of points.The data required for phase space reconstruction is significantly increased, which greatly complicates the calculation work, so we need to choose an appropriate embedding dimension m to make the attractor open completely, and cause less noise.The choice of τ is not important in principle, but it is critical to select the appropriate value for the actual finite data.If τ is too small, the correlation of the coordinates is too strong and the information is not easy to reveal; if τ is too large, it will make the dynamic system described by the time series distortion [32].If the maximum Lyapunov exponent of the attractor in the reconstructed phase space is greater than 0, then it is proved that the time series has the chaos attribute [33].

Mutual Information Method to Determine the Delay Time
Generally, the autocorrelation method and mutual information method are used to calculate the delay time.The autocorrelation method extracts the linear correlation of time series.The mutual information method, which is superior to the autocorrelation method in the selection of delay time, measures the linear or nonlinear random correlation between two random variables.Based on this, we select the mutual information method to determine the delay time.The basic principle is as follows: Based on discrete random variable X, Y, mutual information I(X, Y) is expressed as: HpXq " ´q 12392 Energies 2015, 8, 12388-12408 Among them, P(x i ) is the probability of event x i , q is the count of events and states which might happen, logarithmic functions often take 2 to the base.
For time series xi and delay time τ with the length of n, if the probability of xi appearing in tx i u is set as p(x i ), then the probability of x i`τ appearing in tx i u is P(x i`τ ), which is calculated by the frequency of occurrence in the corresponding time series.The joint probability of x i and x i`τ appearing in both two sequences is P(x i , x i`τ ), which can be obtained from the corresponding lattice on the plane (x i , x i`τ ).Thus, mutual information I(X i , X i`τ ) is a function I(τ) of delay time τ.The time delay of phase space reconstruction is the residence time when the function I(τ) reaches the minimum value for the first time.

False Nearest Neighbors Method (FNN) to Determine the Embedding Dimension
From the geometry point of view, a chaotic time series is the projection of the trajectory of the chaotic motion of the high-dimensional phase space in a one-dimensional space.In the projection process, the chaotic motion trajectory will be distorted.Points which are not adjacent in high-dimensional phase space may be adjacent when projected onto the one-dimensional axis, constituting false neighbor points.This is the reason why the chaotic time series is irregular.Reconstruction of phase space is actually the recovery of the chaotic motion trajectory from the chaotic time series.With the increase of embedding dimension m, the orbit of chaotic motion will gradually open, false neighbor points will be eliminated gradually and the trajectory of chaotic motion will be recovered, which is the starting point of false nearest neighbors method (FNN) [34].
In the d dimensional phase space, each phase point vector x(i) " tx(i), x(i `τ), ¨¨¨, x(i `(d ´1)τ)u, has a nearest neighbor point x NN (i) within certain distance and its distance can be described as: When the dimension of phase space increasing from d to d `1, the distance will change to R d`1 (i), and: If the value of R d`1 (i) is much larger than that of R d (i), it can be considered the it is caused by the reason that the two adjacent points of the high dimensional chaotic attractor are projected to the lower dimensional orbit and become two adjacent points.Thus, such neighbor points are false.
Assuming that: If a 1 (i, d) ą R τ , then x NN (i) is the false nearest neighbor point of x(i).Threshold value R τ can be selected between [10,50].
For the measured time series, we calculate the proportion of false nearest neighbors from the minimum embedding dimension, then increase d until the proportion of false nearest neighbors is less than 5 %, or the number of false nearest neighbors no longer decreases with the increase of d.At this time, the chaotic attractor can be thought to open completely and d is the embedding dimension.FNN is regarded as a quite effective method in terms of determining the embedding dimension of phase space dimension.

The Maximum lyapunov Exponent for Chaotic Recognition
The value of the Lyapunov exponent reflects the shrinkage and tensile properties of the system in all directions.This paper calculates the Lyapunov exponent by means of a small data method to determine the chaos characteristics of the original sequence.

ESN
ESN is a new recurrent neural network composed of input, hidden and output layers [35].The basic idea is to simplify the network training process, using large-scale recursive networks with random connections to replace the middle layer of a classic neural network.Among them, the hidden layer, called dynamic reservoir, is the core structure of the ESN network, and it is composed of relatively rich neurons with random and sparse connection relations, as shown in Figure 3.
Energies 2015, 8 8 connections to replace the middle layer of a classic neural network.Among them, the hidden layer, called dynamic reservoir, is the core structure of the ESN network, and it is composed of relatively rich neurons with random and sparse connection relations, as shown in Figure 3. ESN reservoirs have a large number of neurons, approximately 20-500, so they have a great short-term memory [31].Assuming that an ESN network has M input units, N internal neurons, and L output units, then the input unit ( ) u n , internal state ( ) x n and output unit ( ) y n at the moment n are described as: The reservoir state and output of ESN are updated according to the following formulas: ( 1) W represents the connection weight matrix within the reservoir, and its sparse connection usually remains at 1%-5%, while the spectral radius  is usually less than ESN reservoirs have a large number of neurons, approximately 20-500, so they have a great short-term memory [31].Assuming that an ESN network has M input units, N internal neurons, and L output units, then the input unit u(n), internal state x(n) and output unit y(n) at the moment n are described as: The reservoir state and output of ESN are updated according to the following formulas: W represents the connection weight matrix within the reservoir, and its sparse connection usually remains at 1%-5%, while the spectral radius ρ is usually less than 1 to ensure the echo state property and the dynamic memory capacity and stability of reservoir of ESN; W in and W back , respectively, represent the connection weight matrix of input and output to state variables; W out represents the connection weight matrix of reservoirs, inputs and outputs to output, W out bias is a bias term of output, or may represent noise; f " f [ f 1 , f 2 , ¨¨¨f N ] indicates the internal neuron activation function, and f i (i " 1, 2 ¨¨¨N) normally takes hyperbolic tangent function.The output function is . Typically, f i out (i " 1, 2 ¨¨¨L) takes the identity function.In network training, the connection weight matrixes W, W in and W back connected to the reservoir are generated randomly, and will not change once generated, while the connection weight matrix W out connected to the output is generated through testing.Therefore, the training purpose of ESN is to determine the output weight W out [28].
When predicting a chaotic time series, we use the basic model with no output feedback, that is W back " 0. Meanwhile, to simplify the calculation, we also assume that the connection weight of Energies 2015, 8, 12388-12408 input to output and output to output should be 0. The training procedure of output weight matrix W out is as follows [26]: (1) Initialize the ESN.The weight matrixes W and W in are generated randomly, elements of them are initialized to random numbers obeying [0, 1] uniform distribution, and will not change once produced.The state vector x(0) is set to zero; (2) Update the state vector of ESN reservoir.According to the formula Equation ( 4), calculate the new reservoir state vector x(n `1) with the given input vector u(n) and state connection vector x(n); (3) Calculate the output weight matrix W out .Select all state connection vectors and desired output vectors after the time point n " A 0 , then train the weight matrix W out according to the following formulas: (4) After completing the training of output weight matrix W out , set all the input vectors u(n), (n " A `1, ¨¨¨, N) after time point n " A as the test input vectors and take them into the Equation ( 5) to get the predicted output vector ŷ(n).

The Prediction Steps of Chaotic ESN
Specific prediction steps using chaos ESN are as follows: (1) Construct the network.Establish the network between an input matrix X and output vector Y, using the calculated embedding dimension m and delay time τ of chaotic time series as the input number [36,37]: Y " (2) Learning period.Select a certain length time series, and train the network using the matrix X as input and the vector Y as output.By adopting the output of the before process, compare the result with the target model.If there is any error, start the back-propagation process immediately, and correct the network weights in order to reduce error.
(3) Forecasting period.The input of ESN is set to be X 1 " and then the output will be the actual prediction value Y 1 " x n+2 of the sequence.The predicted value should be used in the iteration prediction, as the network input of a new round.

Theoretical Introduction of Optimization Algorithm
During the training of ESN network, over reliance on the training set is prone to appear as over fitting and reduce the generalization of the network.This paper optimizes the output weights through the hybrid complementary algorithm of particle swarm optimization and tabu search to improve generalization of the network.

Basic Theory of Particle Swarm Optimization
Particle swarm optimization (PSO) algorithm is an evolutionary algorithm.It originates from observing birds' behavior while searching for food.The conversion process of the motion of whole flock from disorderly to orderly comes from information shared by each individual bird in the flock, so as to find food [38].
The method optimized by PSO is to initialize a set of stochastic particles swam and find out the best method through multiple iterations.In the iteration process, each particle updates its direction and position constantly according to two extreme values.The first one is the optimal solution found by the particle itself, called individual extreme p best , and the other is the current optimal solution found by the entire particle swarm, named global extreme g best .At the beginning of the iterations, the position of each initialized particle is the individual extreme, while the best position of the particle swarm is the global extreme.
After all of the particles in the swarm complete the first iteration, After all particles finish the first iteration, the front and rear position of each particle should be compared, and the individual extreme with the optimal solution in this iteration should be updated if the new position is better than the previous one.Then, we need to get the optimal solution throughout the individual extremes of all particles in the swarm as the global extreme by comparison, and update the global extreme if the new one is better than the old one.The final global extreme obtained through these cycle iteration operations determines the optimal solution [39].The diagram of the movement of particles is shown in Figure 4.
After obtaining individual extreme and global extreme in the process, each particle needs to update its velocity and position according to the following formulas: v i,j`1 " wv i,j `c1 ˚random() ˚(p best i,j ´Pi,j )c 2 ˚random()*(g best i,j ´Pi,j ) where, v i,j denotes the velocity of the i particle after j iterations, P i,j means the position of the i particle after j iterations, p best i,j and g best i,j are on behalf of the individual extreme and global extreme of the i particle after j iterations, w is the inertia weight of the updated speed to the speed of pre update, random() is a random number within p0, 1q, c 1 and c 2 are learning factors within p0, 2s.The velocity of particles in the swarm is limited in p0, v max q, and the updated value should be replaced with v max if it exceeds maximum v max during the iteration process.
In the particle swarm optimization algorithm process, particles share the global extreme value with other particles within the group.This one-way flow of shared information and data makes the whole search process follow the group within the current optimal solution.Therefore, the initial particle swarm In the particle swarm optimization algorithm process, particles share the global extreme value with other particles within the group.This one-way flow of shared information and data makes the whole search process follow the group within the current optimal solution.Therefore, the initial particle swarm optimization algorithm has fast global convergence capability.

Tabu Search
The tabu search algorithm adopts a neighborhood search algorithm through imitating human's memory to avoid local optimal solutions.The algorithm must be able to accept an inferior solution, that is to say, the solution obtained from iteration is not necessarily better than the previous one, but once the inferior solution is accepted, it is possible for an iteration to fall into a circle [40].The basic theory is to operate on the current solution through the application of a moving operator when giving an initial solution.A set of neighborhoods of the current solution will be generated after that.Then an optimal solution is searched as the current solution through the neighborhoods.The above operation is repeated until the convergence condition is satisfied.To avoid falling into a search loop, a flexible "memory" technology is utilized in the search: setting a taboo table of designated length so that it is propitious to record and select optimization procedures which have been carried out before, as well as to guide the next search direction.Once the algorithm puts some move into the tabu table, it will be banned.In this way, it is possible to prevent algorithm from revisiting a solution which has been already visited in the last several iterations, which averts the circulation and helps the algorithm avoid local optimal solutions.The tabu table will be updated in the iterative process.After iteration for a certain number of times, a move which is beyond the length of the taboo list will be shifted out of the table and the solution is released from it.Such a rule is called contempt criteria [41].In general, tabu search has several key elements as follows: (1) Encoder mode: the initial solution of the TS can be randomly generated.It can also be readily obtained by means of heuristics based on the properties of the problem at hand; (2) The choice of candidate solutions: the number of candidate solutions should be firstly determined followed by the choice of the optimal candidate solution.In principle, all neighborhood solutions of the current solution should be executed.However, when the scale of a problem is large, there will be a great number of neighborhood solutions.Considering the efficiency of neighborhood search, it is better to merely select subsets of the neighborhood.As for the selection of best candidate solution, the general approach is to choose the best one that meets the principle of contempt or non-taboo in the set of candidate solutions; (3) Tabu table and its length: tabu table usually adopts a FIFO queue and its length can be fixed or change adaptively; (4) Contempt principles: it often uses simple contempt principles that if a solution is better than the optimum, its tabu properties are ignored.The solution are selected as the current solution; (5) Termination condition: the common condition is the maximum number of functions or the maximum continuous iteration number when the best solution can sustain.
It is advantageous for tabu search to accept inferior solutions through the search process and it has strong mountain climbing ability, which can avoid slipping into local optimal spot.The drawback is that the initial has strong dependence.Iterative search process is moving a solution to another one, which reducing the efficiency of obtaining global optimal solution.

Hybrid Algorithm Based on Particle Swarm Algorithm and Tabu Search
The particle swarm algorithm belongs to iterative random search algorithm in nature, which can optimize a variety of functions effectively.It also has the advantage of simple operation and strong global searching ability.Great results are demonstrated in continuous optimization and discrete optimization problems.However, as the execution progresses, the algorithm can easily fall into a 12397 Energies 2015, 8, 12388-12408 local extreme point.Meanwhile, the search precision value is not high and the convergence velocity is slow.It is prone to the precocious phenomenon.
The tabu search algorithm has a fast convergence rate and strong local search ability, but its search performance relies heavily on the initial value which is already given.A better initial solution often makes the tabu search algorithm converge to the global optimal solution quickly while a poor one may greatly diminish its convergence velocity.Thus, another heuristic algorithms is often applied to provide a better initial solution, in order to improve the performance of the search algorithm.
In consideration of the characteristics of the two algorithms, this paper introduces the diversity thought of tabu search algorithm into the mechanisms of "group share" and "random search" of the particle swarm to make the search process have memory.Owing to the better globe search capability of the particle swarm algorithm, it is available in the first stage.The tabu search algorithm can work in the later stage in consideration of its stronger local search capability.The TS algorithm with its strong local search ability is introduced in the later period of the particle swarm algorithm.Consequently, we can leverage their comparative advantages successfully and readily solve the difficulty of hard convergence.
To improve search efficiency, this paper assumes that the key in the combination of these two algorithms is to escape this local optimal state through the tabu algorithm when the particle swarm algorithm tends to precocious phenomenon.However, as it is difficult to determine whether the precocious phenomenon takes place, we adopt the sample variance of fitness to judge precociousness referring to [42]: where, σ is the variance of sample's fitness after particle swarm optimization; σ 1 is the sample variance after the last particle swarm optimization; m and n are two numbers approaching to 1.It is possible to deem that the particle swarm algorithm is going to be premature when the changes of variance generated from two adjacent optimizations are small.There is no theoretical method to determine the values of m and n.If the neighborhood of rm, ns is too large, the program is likely to jump outside global search.To achieve the best effects, this paper sets m " 0.9, n " 1.1 through multiple operations.

Chaotic ESN Optimized by Particle Swarm and Tabu Search
Neural networks have powerful parallel processing and nonlinear mapping capability.In other words, they can combine various approaches to process non-linear signals and tools.This can be used to study time series, predict and control for unknown power system.As the chaotic time series has regularity in the interior which arises from the nonlinearity, it indicates some correlation among time series in the time delay state space, which makes the system seems have a memory function, while it is also difficult to express this rule by conventional analytical methods.This information processing manner is exactly what the neural networks have; therefore, neural networks are adaptive to predict chaotic time series.
The ESN is a new type of large-scale recurrent neural network proposed by Jaeger and Haas [35] in 2004.Currently, this model has been widely used to predict and study chaotic time series [26][27][28][29].However, ESN also has its own shortcomings.In many instances, ESN's state matrix would be ill-conditioned, causing excessive amplitude of output weights and influencing the model's accuracy and generalization ability.To solve the problem, this paper proposes a combined optimization model to optimize the ESN's output weights based on particle swarm and tabu searching to improve the robustness of the model.Figure 5 and the specific forecasting steps and are shown as below.
(1) Determine the delay time τ and embedding dimension m according to the mutual information method and FNN method to reconstruct the phase sequence of wind power generation; 12398 Energies 2015, 8, 12388-12408 (2) Use the reconstructed data as sample data, separate the training set and test set and set the parameters of the model, including: particle swarm size M, iteration number, tabu search length, etc.; (3) Initialize the parameters of the particle swarm optimization and determine the initial position and speed to find the initial self-optimal position of each particle, that is, the position of each particle itself; (4) Calculate the fitness of particle swarm, and find out p best and g best ; (5) Update the position and speed of all particles according to Equations ( 13) and ( 14); (6) According to Equation (15), judge that whether the particle tends to premature.If yes, then go to the next step of tabu search; otherwise, determine whether the particle swarm optimization meet the convergence condition, if yes, then output results, otherwise, go to Step (3); ( 7) Set the current solution optimized by particle swarm as the initial solution of the tabu search, use the neighborhood function of the current global extremum to produce a number of neighborhood values, and select some candidate solutions with the highest fitness; (8) Judge each candidate using the contempt criterion, if satisfactory, replace the current solution p best with the candidate solution and update the tabu list and optimal solution, then go to the Step ( 6), otherwise, proceed to the next step; (9) Judge whether the candidate solution is in the tabu list, select the best state instead of the current solution, find the corresponding tabu object to replace the taboo object which firstly entered the tabu table; (10) Judge whether meet the maximum iteration number, if yes, then output optimal solution, otherwise, go to Step (3).
Energies 2015, 8 14 (9) Judge whether the candidate solution is in the tabu list, select the best state instead of the current solution, find the corresponding tabu object to replace the taboo object which firstly entered the tabu table; (10) Judge whether meet the maximum iteration number, if yes, then output optimal solution, otherwise, go to Step (3).12399 Energies 2015, 8, 12388-12408

Multi Time Scale Forecasting of Short-term Wind Power Generation without Considering Seasonal Factors
Ultrashort-term and short-term wind power forecasting with high prediction accuracy of the model requirements, have great significance for reducing the phenomenon of abandoned wind power, optimizing the conventional power generation plan, adjusting maintenance schedules and developing real-time monitoring systems.In this paper, we consider different time scales and predict the wind power output.Through reading the literature, we can find that for the time scale of wind power generation minutes, hours and days are usually chosen [5][6][7]16,[19][20][21].Thus, without considering the seasonal factors, we select a set data from the Jilin wind farm in the second quarter in 2013 for 5 min' ultrashort-term and hour-ahead short-term wind power forecasting.The prediction of day-ahead wind power generation will be discussed in the next section in the seasonal forecasting of wind power.

Ultrashort-Term Wind Power Generation Based on the Time Scale of 5 Min
Wind power output in a certain wind farm from 1 to 5 June 2013 was selected as a sample for analysis, sampling every 5 min and collecting 288 points a day to form a dataset with a total of 1430 sample points for empirical analysis.Among them, the first 1152 data are set as training samples, and the last 278 data in the last four days are used as test samples.
The original data should be normalized before the network study: x " x ´xmin x max ´xmin After the normalized treatment, the values of each variable are between r0, 1s, which eliminates the dimensional effect.We select the mutual information method for delay time to reconstruct the chaotic time series.As we know, the time when the mutual information function reaches the minimum time for the first time is the optimal value, so as shown in Figure 6.τ " 4 is the optimal delay time.

Ultrashort-Term Wind Power Generation Based on the Time Scale of 5 Min
Wind power output in a certain wind farm from 1 to 5 June 2013 was selected as a sample for analysis, sampling every 5 min and collecting 288 points a day to form a dataset with a total of 1430 sample points for empirical analysis.Among them, the first 1152 data are set as training samples, and the last 278 data in the last four days are used as test samples.
The original data should be normalized before the network study: After the normalized treatment, the values of each variable are between   0 1 , , which eliminates the dimensional effect.We select the mutual information method for delay time to reconstruct the chaotic time series.As we know, the time when the mutual information function reaches the minimum time for the first time is the optimal value, so as shown in Figure 6. 4   is the optimal delay time.The embedding dimension is determined by the FNN false nearest neighbor point method.When the false nearest neighbor points no longer decreases with the increase of m , the minimum embedding dimension is m .It can be seen from Figure 7 that all discriminant curves of discriminant 1, 2 and the joint curve display when m is greater than or equal to 5, no longer decrease with the increase of m , and tend to be stable.Thus the minimum embedding dimension is 5 m  .The embedding dimension is determined by the FNN false nearest neighbor point method.When the false nearest neighbor points no longer decreases with the increase of m, the minimum embedding dimension is m.It can be seen from Figure 7 that all discriminant curves of discriminant 1, 2 and the joint curve display when m is greater than or equal to 5, no longer decrease with the increase of m, and tend to be stable.Thus the minimum embedding dimension is m " 5.
Energies 2015, 8, 12388-12408 The embedding dimension is determined by the FNN false nearest neighbor point method.When the false nearest neighbor points no longer decreases with the increase of m , the minimum embedding dimension is m .It can be seen from Figure 7 that all discriminant curves of discriminant 1, 2 and the joint curve display when m is greater than or equal to 5, no longer decrease with the increase of m , and tend to be stable.Thus the minimum embedding dimension is 5 m  .The phase are reconstructed with the time delay τ " 4 and the embedding dimension m " 5. Using the small data volume method, we know that the maximum Lyapunov exponent is λ 1 " 3.1.The result of λ 1 ą 0 verifies the time series of wind power generation has chaotic properties, so it can be predicted by the chaos method.
The reconstructed training samples are put into the ESN network, choosing the pool dimension of 500 and maintaining a 3% sparse connection.As the spectral radius of the connection weight matrix has no uniform standards and algorithms, the optimal value is found through many experiments.This paper determines a spectral radius of 7.54, pool dimension of 500 and maintain sparse connection of 3% through the calculations.
The activation function of the pool is tansig and the output unit adopts a linear activation function.The output weights of the training method refer to the particle swarm and tabu search training process.At the same time, the standard ESN network and BP neural network are used for comparison.The network parameters of the standard ESN network are the same as ESN network optimized by PSO and TS algorithm (PSOTS-ESN), and the output weights use the least squares method.
Various error metrics between the actual and forecasting data have been defined to assess the forecasting performance.In our experiments, mean absolute percentage error (MAPE), root mean square error (RMSE) and mean absolute error (MAE) are introduced to appraise and compare the different simulation results: RMSE " MAE " Based on the parameter settings, the results of the prediction of the wind power output by the particle swarm tabu search optimized ESN and the comparison methods are illustrated in Figures 8-10 respectively.Overall, the forecasting trends predicted by the four models are consistent with the actual value.The PSOTS-ESN algorithm and the standard PSO-ESN algorithm are closer to the actual curve, while the deviation of BP forecasting curve is relatively large, especially at the peak point of the curve.Error evaluations of the different algorithms are presented in Table 1.The error of MAPE, MAE and RMSE of PSOTS-ESN is the smallest, which improves the effect of short-term wind power forecasting.The RMSE and MAE of PSOTS-ESN method are 2.97 MW and 8.82 MW, respectively.The maximum is BP with the value of 4.8 MW and 15.8 MW.Compared with other methods, the proposed method has less computation time and the convergence speed of the algorithm is improved.
From the above tables and figures, it can be concluded that: (a) The result of Figures 8 and 9 1, we can find that the forecasting precision of the standard ESN is higher than that of the BP neural network.The conclusion can be drawn as follows: for the forecasting of non-stationary time series, the processing power of ESN is better than that of BP, and the prediction accuracy is higher.

Short-Term Wind Power Generation Based on the Time Scale of 1 H
In wind power prediction, hourly prediction is the most common way used.As short-term wind power forecasting has high precision requirements and has important practical significance for the power grid enterprise scheduling, this paper selects 352 consecutive hours of wind power output from a wind farm data for testing to validate the generalization ability of the proposed method.The first 282 data points are used as training set and the latter 70 data are as testing set.The same analysis can be made as proposed in Section 5.1.1.
Figure 11 displays the forecast curves of the proposed PSOTS-ESN method and the comparison with the two BP and ESN model methods.Figures 12 and 13 present the histograms of the relative errors of the three methods.Figure 11 displays the forecast curves of the proposed PSOTS-ESN method and the comparison with the two BP and ESN model methods.Figures 12 and 13 present the histograms of the relative errors of the three methods.Seen from Figures 12 and 13 and Table 2, the method presented in this paper has higher accuracy.The vast majority of the predictive error of the proposed method is within 20%, and 70% of the relative error is within 5%.At the same time, the numerical values of the three error functions are less than the contrast methods.The prediction results are satisfactory.In the comparison with ESN and BP,  Seen from Figures 12 and 13 and Table 2, the method presented in this paper has higher accuracy.The vast majority of the predictive error of the proposed method is within 20%, and 70% of the relative error is within 5%.At the same time, the numerical values of the three error functions are less than the contrast methods.The prediction results are satisfactory.In the comparison with ESN and BP, Seen from Figures 12 and 13 and Table 2, the method presented in this paper has higher accuracy.The vast majority of the predictive error of the proposed method is within 20%, and 70% of the Energies 2015, 8, 12388-12408 relative error is within 5%.At the same time, the numerical values of the three error functions are less than the contrast methods.The prediction results are satisfactory.In the comparison with ESN and BP, the relative error of the prediction results are both within 50%, the overall error of ESN is less than BP.The result shows that the effect of ESN prediction is better than BP model for short-term non-stationary sequence.From the final results, the proposed method demonstrate the excellent performance in the wind power forecasting based on the time scale of 1 h with strong generalization ability and robustness.The wind power output depends mainly on the local wind resources, and the characteristics of wind resources mainly refers to the change of wind speed.There are obvious differences in wind speed and wind direction during the different seasons in China.Consequently, the seasonal factor has an important influence on the wind power forecasting.With the changes of the seasonal cycle, the wind speed in the same season and the same month has a certain periodicity.Based on this feature, wind power output also has a certain periodicity.The seasonal analysis of wind power can be predicted according to the rules of different seasons to improve the prediction accuracy.
The empirical analysis is conducted with the daily data of wind power output for the four quarters from 2014 to 2015 for a wind farm in Jilin.Each quarter selects the previous 70 data points for the training set and the last 20 data points for the testing set, getting the following forecasting results in Figure 14, where, the blue dotted line in Figure 14 represents the original value and the red solid line shows the predicted value.
Energies 2015, 8 20 wind direction during the different seasons in China.Consequently, the seasonal factor has an important influence on the wind power forecasting.With the changes of the seasonal cycle, the wind speed in the same season and the same month has a certain periodicity.Based on this feature, wind power output also has a certain periodicity.The seasonal analysis of wind power can be predicted according to the rules of different seasons to improve the prediction accuracy.The empirical analysis is conducted with the daily data of wind power output for the four quarters from 2014 to 2015 for a wind farm in Jilin.Each quarter selects the previous 70 data points for the training set and the last 20 data points for the testing set, getting the following forecasting results in Figure 14, where, the blue dotted line in Figure 14 represents the original value and the red solid line shows the predicted value.Table 3 shows the error function values predicted by different methods in different seasons, while the error values of forecasting results without considering the seasonal factors are also listed.From the quarterly forecasting results, it can be seen that the forecasting error the quarter and the fourth quarter   Table 3 shows the error function values predicted by different methods in different seasons, while the error values of forecasting results without considering the seasonal factors are also listed.From the quarterly forecasting results, it can be seen that the forecasting error the quarter and the fourth quarter are relatively small, while the forecasting error in the summer and fall are larger.This occurs because the wind speed and wind direction in summer and autumn are not stable, causing a weak regularity of the wind power output.In the seasonal wind power generation based on the time scale of one day, the prediction error of the proposed method is the smallest and the prediction accuracy is the highest, which verifies the generalization ability and robustness of the proposed method.
In this paper, we take the sample without considering the seasonal factors as a comparison.As can be seen from the results, the prediction error of the PSOTS-ESN method in this paper is relatively smaller, but the overall prediction error is far higher than for the quarterly forecasting results.The wind power generation is influenced by the seasons.Through the above analysis, it can be known that wind power generation has a certain seasonality and periodicity, which is advantageous to improve the overall forecast accuracy by taking into account the seasonal wind power generation.The unit of MAPE is %, unit of RMSE and MAE is MW.

Conclusions
In this paper, a novel composite PSOTS-ESN technique for ultrashort-term and short-term wind power forecasting systems is proposed.The wind power generation results are affected by wind speed, wind direction, temperature and other natural climate factors, showing a certain disorder and regularity.The chaos time series method takes advantage in dealing with uncertain and regularity time series.It can order the chaotic time series into a new sequence through the delay time and the embedding dimension to meet the short-term forecast demands.The combination of particle swarm optimization and tabu search show the advantages of the two algorithms, and also make up for the deficiency of the single algorithm in the implementation process, making the optimization results more accurate.This paper selects the wind power output data from a certain wind farm for multi time scale forecasting.Meanwhile, the influence of seasonal factors on wind power is taken into consideration.The results show that wind power has a seasonal pattern and the original wind power time series demonstrate chaotic characteristics.The proposed PSOTS-ESN is used to predict the multi time scale wind power output.The experiments demonstrate that the method proposed in this paper outperforms other models for the same task and has strong generalization ability and robustness.It is shown that the proposed model is effective to predict wind power time series with its short-time memory ability and achieves satisfactory results.

Figure 1 .
Figure 1.National ranking of the top ten countries by total installed wind power capacity.

Figure 1 .
Figure 1.National ranking of the top ten countries by total installed wind power capacity.

Figure 5 .
Figure 5. Flow chart of chaos ESN optimized by particle swarm and tabu search algorithm.

Figure 5 .
Figure 5. Flow chart of chaos ESN optimized by particle swarm and tabu search algorithm.

Figure 6 .
Figure 6.Optimal delay time calculated by mutual information method.

Figure 6 .
Figure 6.Optimal delay time calculated by mutual information method.

Figure 9 .
Figure 9. Curves of comparison model forecasting results.

Energies 2015, 8 ,
are used as training set and the latter 70 data are as testing set.The same analysis can be made as proposed in Section 5.1.1.

Figure 11 .Figure 11 . 19 Figure 12 .
Figure 11.Curves of forecasting model results of echo state network optimized by particle swarm optimization and tabu search algorithm (PSOTS-ESN), ESN and BP.

Figure 13 .
Figure 13.Error distribution histogram of ESN and BP.

Figure 13 .
Figure 13.Error distribution histogram of ESN and BP.

Figure 13 .
Figure 13.Error distribution histogram of ESN and BP.

Energies 2015, 8 , 1 to ensure the echo state property and the dynamic memory capacity and stability of reservoir of ESN; in i f i , L   takes the identity function.In network training, the connection weight matrixes W , in W and back W connected to the reservoir are generated randomly, and will not change once generated, while the connection weight matrix out W connected to the output is generated through testing.Therefore, the training purpose of ESN is to determine the output weight out W [28]. u(n) back Figure 3. Echo state network (ESN) structure.

Table 1 .
Error and computational time comparison of the different algorithms.Echo state network optimized by particle swarm optimization and tabu search algorithm: PSOTS-ESN; particle swarm optimization echo state network: PSO-ESN; echo state network: ESN; back propagation: BP; mean absolute percentage error: MAPE; mean absolute error: MAE; root mean square error: RMSE.
and Table 1 denote that the proposed PSOTS-ESN model in this paper has more obvious advantages than the PSO-ESN model.The error metrics and computation time of the PSOTS-ESN are all less than those of the PSO-ESN model.The main reason is the fact that the hybrid optimization algorithm makes up for the defects of any single algorithm and chaotic time reconstruction can eliminate the chaos of the original data without regularity which improves the accuracy and the generalization ability of the model and makes for a better prediction effect.(b) Comparing Figures 8-10 we can find that the forecasting precision of the combined model is higher than that of the single model.As the combination forecasting model can realize the complementary advantages of different algorithms to improve the accuracy of the algorithm, the prediction of a single model has a large limitation.(c) From Figures 9 and 10 and Table

Table 2 .
Error comparison of different algorithms for 1 h forecasting.

Table 3 .
Error comparison of different algorithms for seasonal forecasting.