An Innovative Hybrid Model Based on Data Pre-Processing and Modified Optimization Algorithm and Its Application in Wind Speed Forecasting

Wind speed forecasting has an unsuperseded function in the high-efficiency operation of wind farms, and is significant in wind-related engineering studies. Back-propagation (BP) algorithms have been comprehensively employed to forecast time series that are nonlinear, irregular, and unstable. However, the single model usually overlooks the importance of data pre-processing and parameter optimization of the model, which results in weak forecasting performance. In this paper, a more precise and robust model that combines data pre-processing, BP neural network, and a modified artificial intelligence optimization algorithm was proposed, which succeeded in avoiding the limitations of the individual algorithm. The novel model not only improves the forecasting accuracy but also retains the advantages of the firefly algorithm (FA) and overcomes the disadvantage of the FA while optimizing in the later stage. To verify the forecasting performance of the presented hybrid model, 10-min wind speed data from Penglai city, Shandong province, China, were analyzed in this study. The simulations revealed that the proposed hybrid model significantly outperforms other single metaheuristics.


Introduction
Wind power is one of the most significant recycled energy resources presently being applied [1].Recently, due to the pollution of the global environment, recyclable energy [2] and non-polluting sources such as wind energy have been gaining extensive attention [3].Wind energy, which is one of the most promising and active recyclable sources, is providing an increasingly strong supplement to traditional energy sources [4].When it comes to the accurate forecasting of wind speed and its wide use in wind power, we encounter great challenges, because the wind is a periodical phenomenon [5] with a nonlinear, anomalistic, and stochastic nature.Wind speed forecasting is applied in several domains, for instance, target tracking, shipping, weather forecasting, agricultural production, and electric load forecasting.To dispatch wind energy before wind power grid integration, it is very important for a wind farm operator to accurately determine the wind speed.This is because the local wind speed is always the foremost factor affecting wind power generation, and can be used for wind turbine selection and for wind farm layout [6].In addition, wind speed can enhance the power system's schedule and strengthen resource configuration, promoting the reliability of the power grid.
Predictions made with higher precision can allow power system operators to dispatch power efficiently in order to properly meet the demands of consumers [7].
Given a more precise wind speed value, the power operator is able to forecast power delivery.This is extremely helpful for power systems in terms of optimizing storage capacity, making sensible and proper programs, and dispatching electric energy well.Because of the wind's irregularity and complex fluctuations, variations in wind speed forecasting may result in quick changes in the prediction results of wind power.This feature indicates that accurate wind speed forecasting is highly important.The wind speed forecast plays a vital role in utilizing wind power appropriately and efficiently.Various methods have been proposed to promote the accuracy of wind speed prediction.Three of the most extensively used methods are the physical forecasting method, the conventional statistical method, and the artificial intelligence method.Given a series of meteorological parameters, the physical forecasting method uses physical variables to derive a time series forecast.Therefore, higher prediction accuracy can be obtained using this method [8].However, their extremely intricate computations always lead to it being largely a waste of time.Numerical weather forecasting is one of the most widely used physical forecasting methods, consisting of a computer program that aims to solve questions through meteorological data processing and describe how the atmosphere changes as time goes on [9].In addition, the traditional approaches include the regression analysis method, the auto-regressive integrated moving average (ARIMA) [10] model, the non-parametric estimation method, exponential smoothing [11], the state-space model [12], Box-Jenkins models [13], the spatial correlation model, and the difference method.Furthermore, support vector machines (SVMs) [14] such as non-neural networks are also frequently applied in wind speed forecasting.
Among the above methods, artificial neural networks (ANNs) have been frequently and widely applied.By imitating the human brain in handling information with a sequence of neurons, ANNs obtain a distinguished capacity for mapping, and their complex and highly nonlinear input and output modes with making nothing of the type of real model can establish some simple models and compose different networks depending on different connections.Therefore, ANNs demonstrate the following advantages: high adaptability, excellent ability to learn using cases, and ability to summarize.It is well known that the multi-layered perceptron (MLP) is one of the most broadly used ANN methods.The vast majority of available methods that can be used to train ANNs pay close attention only to the alteration of connection weights in a certain topology, which usually leads to defective results.
MLPs are prosperously applied in many fields, such as pattern classification [15], digit recognition [16], image processing, coal price prediction [17], function approximation, measurement of object shape [18], and adaptation control.The back-propagation (BP) algorithm [19] performs most effectively of all training algorithms for MLP methods.The selection of a suitable structure for the forecasting question and the alteration of connection weights of the network constitute the two parts of training MLPs for the problem.Several studies have been successfully used to solve these issues.
A great deal of research has been conducted to precisely forecast the wind energy and the local value of wind speed.Wind power and speed forecast is a fundamental problem for wind farm operation, best power flow between the electric system and wind power plant, market price, electric power system dispatching, and wind power resource reserve, and storage programming and dispatching.Over the last few decades, the ANN [20,21] has been the superior model, and has frequently been applied to forecast time series.
The ANN is a pragmatic calculation method, similar to the human biological neuron.Various improvable neural networks exist, of which the following two are the most frequently employed: feed forward neural networks and feedback neural networks.Feed forward neural networks have no feedback.On the contrary, feedback neural networks possess a feedback.Back-propagation (BP) neural networks, perceptrons, and radial basis function (RBF) networks play an important role in feed forward networks.Recurrent Neural Networks (RNNs) [22] and pulsed neural networks are two important models of feedback networks.The feedback networks mainly consist of RNN and spiking neural networks [23].In this paper, we pay more attention to feed forward neural networks [24].
The BP algorithm has various significant advantages; for example, it can help to roughly estimate a great many functions, it is relatively simple to implement, and it can be used as a reference method.In addition, its most effective characteristic is that the momentum parameter and the learning rate factor can be altered, thereby enhancing the innovation speed of the traditional BP algorithm.
To gain good forecasting accuracy and low deviation, many studies [25][26][27][28][29][30][31][32] have been conducted to determine the optimal weights of neural networks.However, an original hybrid model system-a traditional hybrid method based on the rapid searching theory developed by Xiao et al. [33]-has been put into use.An extensive study was conducted by Xiao et al. [33] using four test functions to evaluate the optimization algorithm's capacity for development, searching, avoiding partial optima, and convergence velocity, and the results of this experiment demonstrated that the modified method is more sufficient and excellent than the original algorithm.In recent years, a number of developmental optimization algorithms have been applied to help confirm the threshold values of a prediction method.Particle swarm optimization (PSO) was applied by Liu et al. [25] to optimize the parameters of the prediction technique for short-term electric load prediction in micro-grids.Wang et al. [26] employed a modified PSO to optimize the weight distribution of their proposed combined model developed for electric load prediction.The cuckoo search (CS) algorithm [27][28][29] was applied to determine the parameters of the proposed model for electric load forecasting.Wang et al. [30] modified the CS method to optimize the parameters of multi-step-ahead wind speed forecasting models.Xiao et al. [31] applied the genetic algorithm (GA) to optimize the parameters of the proposed model.In the present paper, a highly valid optimization method, the Broyden-Fletcher-Goldfarb-Shanno-Firefly Algorithm (BFGS-FA), is used to determine the parameters of the proposed hybrid model.
Recently, numerous continuous and novel improvements have been made to promote the effectiveness of the FA for optimizing neural networks, including the binary, Gaussian, firefly, high-dimensional firefly, Lévy flight, simultaneous firefly, and chaos-based FA [34,35].Though most of these improvements to the FA enhance its performance successfully, few of them have been introduced to optimize the parameters of hybrid models.This paper intends not only to enhance the research and development abilities of the FA, but also to minimize the drawback of the partial optima seeking capacity, which appeared in the CS algorithm.On the basis of the BFGS quasi-Newton method, an original improvement of the FA was proposed to enhance the diversity of species of fireflies.Obviously, increasing the convergence standard may result in individual fireflies likely being caught in partial optima; however, it decreased when this optimized algorithm was used.Of course, the decomposition of the original wind sequence is a significant process for data filtering.This can always effectively promote the prediction accuracy of the model to obtain better forecast results [36].Important techniques, such as empirical mode decomposition (EMD) [37], wavelet decomposition (WD) [38], and singular spectrum analysis (SSA) are often applied to remove the noise series.However, the wavelet de-noising algorithm is sensitive to the determination of the threshold, and the EMD may lead to mode confusion [39], which may result in a badly decomposed performance.In addition, SSA has many advantages, and overcomes the disadvantages of EMD and WD in terms of decomposition.Moreover, we analyzed some articles in the literature [40][41][42][43][44] that deal with wind forecasting by applying neural networks, and that are in line with the theme of the present paper.From these studies, we found that some data preprocessing or optimization algorithms are insufficient, and the details are listed in Table 1.Therefore, based on the discussed limitations, this manuscript proposes a characteristic hybrid model that unites the BP algorithm, SSA theory, and BFGS-FA.Ten-minute wind speed values collected from Penglai city, Shandong province, China, were applied to verify the unique hybrid model.The results of tests and practices in this study indicate that the hybrid model considerably outperformed the other three models.This demonstrates that the hybrid method could be applied to calculate wind speeds, which would be beneficial for enabling wind power system to make optimal decisions, such as providing better sites of wind power, taking early measures to reduce losses that can be caused due to bad weather, reducing production costs, and minimizing energy consumption (coal, etc.).This model is also useful for helping wind power companies to make correct decisions in real life.Thus, the hybrid forecasting method with high accuracy represents a model that will have potential application in the near future.Furthermore, the practical hybrid model can also be applied to other forecasting domains, such as target tracking, stock index forecasting, environment forecasting, shipping, weather forecasting, agricultural production, and electric load forecasting.
The primary contributions and novelties of this manuscript are listed as follows: (a) The BFGS-FA method, back propagation neural network (BPNN), and the concept of the de-noising algorithm were combined to form two new models: singular spectrum analysis-back propagation (SSA-BP), and singular spectrum analysis-Broyden-Fletcher-Goldfarb-Shanno-Firefly Algorithm-back propagation (SSA-BFGS-FA-BP).(b) This paper evaluates the developed models on the basis of two aspects: forecasting accuracy and stability.The results indicate that BFGS-FA-BP is a better model when considering accuracy only, but the hybrid SSA-BFGS-FA-BP is a better model overall: even with the low cost of calculation, the accuracy remained high.(c) The novel combined BFGS-FA algorithm successfully avoids the shortcomings of FA while optimizing, during the later period, the low velocity and the poor convergence performance.(d) The proposed hybrid approach integrates the advantages of other individual models.(e) A time sequence pre-processing method was applied to de-noise the raw data successfully.
The remainder of this paper is designed as follows.Section 2 presents the single prediction method developed according to the BPNN and the hybrid forecasting method theory.This section also describes the optimization algorithms BFGS, FA, and their combination BFGS-FA, which are applied to confirm the parameters of the hybrid forecasting model.SSA theory and the Diebold-Mariano (DM) test, which can help to determine the forecasting effectiveness of the developed hybrid method, are introduced at the end of Section 2. In Section 3, the wind speed time sequences collected from three separate sites are used to test the proposed hybrid model.Subsequently, the wind resources and the evaluation criteria of the forecasting model are described.In Section 4, we give a discussion about this study.In the end, Section 5 concludes this paper.Has good adaptability, and can take advantage of other models.
The process of building models is relatively complex. [44]

Methodology
Since McCulloch and Pitts [45] proposed the neural network mathematical model in 1943, ANNs have been applied in numerous fields, including signal processing, market analysis and forecasting, pattern recognition, and automatic control.In this part, separate theories of this innovative hybrid model will be introduced in detail.

BP Algorithm
In mathematically simulating the human brain system, the BP algorithm benefits from its underlying processes, fuzzy information processing, and chaotic performance.On account of the error BP algorithm and the multilayer neural network, the BP neural network performs excellently in training ANNs.An input layer, one or more hidden layers, and an output layer constitute a representative BP network.The BP algorithm is always applied to adjust the thresholds, in which the errors from the output are propagated back into the network, transforming the thresholds as it goes, in order to keep the error [46] from emerging again.Its topology and flow structure are as follows: The main procedures of the BP algorithm can be generalized as follows: Step 1.We obtain the wind speed time sequence and corresponding parameter values from the wind power plant.The inputs have exhaustive information on historical values.The input value is often affected by the site, surrounding temperature, air pressure, time, and even the collectors.Our primary task is to make full use of four different parameters collected from the wind power plant.
Step 2. We transform the original value into the requested form (0 to 1).The normalization method is summarized as follows: Step 3. We build the BP algorithm and set its parameters, which include the number of neurons in the input layer, hidden layer, and output layer; the learning rate; the maximum training times; and training requirement accuracy.The training can be summarized as learning from the historical values to discover the implied information among the previous time series data, which can be applied to forecast the future wind speed.
Step 4. We use the testing set to assess the effectiveness of the trained BP network.
Step 5.In the end, the future wind speed value (output) is forecast by the neural network.
The key parameters that emerged in this study are not sensitive in small intervals; therefore, the key parameters of these algorithms are determined by repeated trails.The corresponding experimental parameters of the method are summarized in Tables 2 and 3.

Experimental Set Point Default Value
The number of units in the input layer 6 The number of units in the hidden layer 7 The number of units in the output layer 1 The learning rate 0.1 The maximum training time 1000 Required accuracy of training 0.00001

Broyden-Fletcher-Goldfarb-Shanno
The BFGS [47] algorithm is an excellent method, and one of the most useful nonlinear quasi-Newton procedures.Definition 1.Let x t be the consequence at the representative iteration t and x t+1 = x t + λ t d t be a recursive function in which λ t is the step size.The hunting path is d t = −D t ∇ f x t , in which D t is an n × n positive certain symmetric matrix as a proximity of the inverse matrix of the real Hessian matrix at x t .Definition 2. The new path of BFGS can be designed as follows: where In addition, the primary BFGS algorithm is generalized in Appendix A.

Firefly Algorithm
The FA was first proposed by Xin-She Yang in 2008 [48].The FA was inspired by the flashing nature of fireflies [49,50].The firefly will be shining while flying, which can be regarded as a signal to attract other companions.The method has three regulations: (1) All fireflies are unisexual; in addition, any firefly can be attracted by others.
(2) Attraction is directly proportional to their brightness; that is, for any two fireflies, the less bright one will be attracted by the brighter one, and will move towards it; the brightness will decrease as the distance between them increases.( 3) If there are no brighter fireflies around a known firefly, it will fly at random.The brightness of the firefly must be tightly related to the objective function.
The experimental set points of the FA are described in Table 3.The FA is a developed computational method that is also used to optimize controller parameters.Each firefly in the FA indicates a solution to the problem, which is defined on the basis of position.In a d-dimensional vector space, the present location of the ith firefly is acquired by x i = (x i1 , . . ., x in , . . ., x id ).The random positions of m fireflies are initialized within the specified range.The position updating equation for the ith firefly, which is attracted to move to a brighter firefly j, is given as follows: In addition, the position updating equation for the brightest firefly is given as follows: where the first terms x i (t) and xbest i (t) of Equations ( 4) and ( 5) are the current positions of a less bright firefly and the brightest firefly, respectively.The second term in Equation ( 4) is the firefly's attraction to light intensity.β 0 is the original attraction at r = 0, γ is the absorption parameter in the range [0, 1], and r ij is the distance between any two fireflies i and j, at position x i and x j , respectively, and can be formulated as a Cartesian or Euclidean distance as follows: where x i and x j are the position vectors for fireflies i and j, respectively, with x in representing the position value for the dimension, and the third term in Equation ( 4) and the second term in Equation ( 5) are used to reduce the randomness; that is, the movement of the fireflies is gradually reduced according to α = α 0 δ t , where α 0 is in the range [0, 1].δ is the random reduction parameter where 0 < δ <1, and t is the iteration number.Every new position must be evaluated by a fitness function, which is assumed to be integral square error.The flow chart of the FA is presented in Figure 1, and the original FA algorithm is summarized in Appendix B.
Energies 2017, 10, 954 8 of 29 where i x and j x are the position vectors for fireflies i and j, respectively, with in x representing the position value for the dimension, and the third term in Equation ( 4) and the second term in Equation ( 5) are used to reduce the randomness; that is, the movement of the fireflies is gradually reduced according to α = α0δ t , where α0 is in the range [0, 1].δ is the random reduction parameter where 0 < δ <1, and t is the iteration number.Every new position must be evaluated by a fitness function, which is assumed to be integral square error.The flow chart of the FA is presented in Figure 1, and the original FA algorithm is summarized in Appendix B.

BFGS-FA
FA possesses good global optimization and development capacities; however, it will usually be manifest a low velocity and poor convergence performance while optimizing in the later stage.Therefore, as shown in Figure 1, the BFGS is applied while FA renews the answers after a generation to search for a sub-optimization solution, which can be used to promote the partial optimization capacity and the rate of partial convergence of the total method.The primary method of BFGS-FA is generalized in Appendix C.

Singular Spectrum Analysis (SSA)
In America and England, SSA has been exploited separately based on singular spectrum analysis, whereas in Russia, it was proposed under the name Caterpillar-SSA [51].SSA possesses the superiority of statistics and probability theory; meanwhile, it assimilates the knowledge of power

BFGS-FA
FA possesses good global optimization and development capacities; however, it will usually be manifest a low velocity and poor convergence performance while optimizing in the later stage.Therefore, as shown in Figure 1, the BFGS is applied while FA renews the answers after a generation to search for a sub-optimization solution, which can be used to promote the partial optimization capacity and the rate of partial convergence of the total method.The primary method of BFGS-FA is generalized in Appendix C.

Singular Spectrum Analysis (SSA)
In America and England, SSA has been exploited separately based on singular spectrum analysis, whereas in Russia, it was proposed under the name Caterpillar-SSA [51].SSA possesses the superiority of statistics and probability theory; meanwhile, it assimilates the knowledge of power systems and signal processing ideas.
Suppose that y = [y 1 , y 2 , . . ., y T ] is a time sequence with T elements.The SSA method contains two parts: decomposition and reconstruction [52,53].

Decomposition
In decomposition, an observed unidimensional time series data y = [y 1 , y 2 , . . ., y T ] is converted into its trajectory matrix.Subsequently, XX T and its corresponding singular value decomposition (SVD) are computed.This can be divided into two steps: embedding and SVD.
Step 1.The primary aim of this step is to propose the concept of the trajectory matrix or deferred edition of the initial time sequence y.The main purpose of this step is to propose the concept of a trajectory matrix or a hysteretic version of the initial time sequence y.The resulting matrix has a window width W (W ≤ T/2), which is usually determined by the operator.Suppose that P = T − W + 1, the trajectory matrix is denoted as follows: In fact, this trajectory matrix is a Hankel matrix; that is, all the elements of the diagonal i + j = const are equal [54].
Step 2. We obtain the covariance matrix XX t from X. XX t processed by the SVD will result in a group of L eigenvalues λ 1 ≥ λ 2 ≥ . . .≥ λ L ≥ 0 and their corresponding eigenvectors U 1 , U 2 , . . ., U L , which are often defined by empirical orthogonal functions.Therefore, the SVD of the trajectory matrix could be denoted as (the total amount of non-zero characteristic values) and V 1 , V 2 , . . .,V d are the corresponding principal components, which are denoted by the ratio of the variance of X-which is defined by , has the highest contribution [55], and E d has the minimum contribution.The SVD will consume more elapsed time if the length of the time sequence is long enough (i.e., T > 1000).

Reconstruction
We compute XX T and its SVD to obtain its L eigenvalues:λ 1 ≥ λ 2 ≥ . . .≥ λ L ≥ 0 and its corresponding eigenvectors.Each signal, as represented by the eigenvalue, is analyzed and assembled to reconstruct the new time series.This section can be resolved into two steps: grouping and averaging.
Step 1.Here, the designer chooses r out of d eigenvalues.Define I = {i1, i2, . . . ,ir} to be a set of r chosen eigenvalues and X I = X i1 + X i2 + . . .+ X ir , in which X I is connected to the "information" of y; nevertheless, the remaining (d-r) eigenvalues, which are not selected, represent the error term ε.
Step 2. The set of r elements chosen from the foregoing section is then applied to regroup the definitive elements of the time sequence.The fundamental concept is to convert each of the terms X i1 , X i2 , . . ., X ir into the reconstructed data time series y i1 , y i2 , . . ., y ir by using the Hankelization process H(Z) or diagonal averaging: assume Z ij is an element of the ordinary matrix Z, then the kth term of the rebuilt time sequence could be acquired by averaging Z ij , on the precondition of i + j = k + 1. Obviously, H(Z) is a time sequence with T elements rebuilt by matrix Z.
After averaging, we can obtain the approximation of y, which is the regrouped time series, and is given as follows: From the whole time series, a singular eigenvalue will be reconstructed as suggested by Alexandrov and Golyandina.This indicates that SSA is not an awkward algorithm, and is therefore strong to abnormal values.
In addition, as shown in Figure 2, the original wind speed preprocessed by SSA is forecast by the BP algorithm, and its parameters are optimized by BFGS-FA.
Energies 2017, 10, 954 10 of 29 From the whole time series, a singular eigenvalue will be reconstructed as suggested by Alexandrov and Golyandina.This indicates that SSA is not an awkward algorithm, and is therefore strong to abnormal values.
In addition, as shown in Figure 2, the original wind speed preprocessed by SSA is forecast by the BP algorithm, and its parameters are optimized by BFGS-FA.

Proposed Hybrid Model
The BP algorithm is selected as the forecasting method to forecast the wind speed time series in this paper.However, because of its unstable structure, we could not obtain more accurate forecasting results with minor error; therefore, it is important to determine the optimal parameters and threshold values of the BP network to promote the predictive effectiveness.BFGF-FA is proposed to determine the weight and threshold.In addition, large amounts of noise present in the original wind sequence will lead to a poor forecasting performance.Therefore, we choose the SSA to remove the noise from the raw time sequence.The corresponding basic procedures are presented as follows, and are depicted in Figure 2.

Proposed Hybrid Model
The BP algorithm is selected as the forecasting method to forecast the wind speed time series in this paper.However, because of its unstable structure, we could not obtain more accurate forecasting results with minor error; therefore, it is important to determine the optimal parameters and threshold values of the BP network to promote the predictive effectiveness.BFGF-FA is proposed to determine the weight and threshold.In addition, large amounts of noise present in the original wind sequence will lead to a poor forecasting performance.Therefore, we choose the SSA to remove the noise from the raw time sequence.The corresponding basic procedures are presented as follows, and are depicted in Figure 2.
Step 1. SSA is used to remove the noise from the raw data.It also aims to remove the high frequency of the original sequence after decomposing, and then reconstructs them into new experimental data.
Step 2. BFGS-FA is used to determine the weight and threshold of the BP neural network.Thus, the ability of the global optimization of the BP algorithm is greatly promoted.
Step 3. The optimized BP neural network is applied to predict the wind speed time sequence.
Step 4. The proposed hybrid model indeed outperforms the single models in forecasting time sequences based on historical values.Multi-step forecasting also proves that the proposed hybrid method has a higher effectiveness, and their forms can be described as follows: (1).One-step prediction: The predictive value p(t + 1) is calculated on the basis of the past time sequence {p(1), p(2), . . . ,p(t − 1), p(t)}, where t is the sample size of the wind speed time sequence.(2).Two-step prediction: The predictive value p(t + 2) is calculated on the basis of the past time sequence {p(1), p(2), . . . ,p(t − 1), p(t)} and the former predictive value p(t + 1).(3).Three-step prediction: The predictive value p(t + 3) is calculated on the basis of the past time sequence {p(1), p(2), . . . ,p(t − 1), p(t)} and the former predictive value p(t + 1) and p(t + 2).(4).Higher-step forecasting value will be obtained on the basis of the above form.

Testing Method
In this paper, we also employed a testing method called the Diebold-Mariano (DM) test to estimate the proposed model.
The Diebold-Mariano (DM) test [56], which is focused on predictive accuracy, compares and evaluates the predictive effectiveness of the proposed hybrid method with other simple models.In practical applications, there will be two or more time sequence models available for predicting a specific variable of interest.
Real values: Two predictions: The prediction errors according to the two models can be described as follows: and: The precision of each forecasting model is evaluated by an appropriate loss function, The most widespread and available loss function is square error loss, and its formulation is as follows: Square error loss: The DM test statistic assesses the prediction according to the random loss function L(p): where S 2 is the estimated value of the variance of t+g , and the null hypothesis is: in contrast, the alternative hypothesis is: Under the null hypothesis, the two predictions possess uniform precision.In contrast, the alternative hypothesis has different standards, namely, the two predictions differ in accuracy.If the null hypothesis is right, the Diebold-Mariano statistic will be an asymptotically standard normal distribution N(0,1).The null hypothesis should not be refused if the calculation of DM statistic falls inside the interval [−Z α/2 , Z α/2 ], otherwise we must reject it; that is, the reject region is (−∞,−Z α/2 )&(Z α/2 ,+∞), which is defined as follows: |DM| > Z α/2 (17) where Z α/2 is the positive Z-value from the standard normal table according to half of the confidence level α of the experiment.

Experimental Design and Results
In this section, the wind speed data gathered from three sites are forecast by the developed hybrid method.The data location and effectiveness of the prediction estimation standard are also presented.All the experiments in this paper were conducted in MATLAB R2014b on Windows 7 with 3.30 GHz Intel (R) Core (TM) i5 4590 CPU, 64 bit and 8 GB RAM.

Data Sets
The hybrid SSA-BFGS-FA-BP method was tested using data from experiments of wind speed prediction time sequences at three sites.A data set gathered at 10-min intervals from Penglai city, Shandong province, China, was used.Figure 3a displays the geographical position of Penglai city in China.
In this study, wind speeds are taken from three different sites, and we chose 1728 of them as observation values.Of these, 1440 values were used to train the network, and the remaining 288 values were selected as the testing set for each station.
The original data from the three sites are shown in Figure 3b, which illustrates the inordinance, wave, and mutability of the original time series.

Forecast Error Metrics
Forecasting errors are applied to assess the ability of the applied forecasting approaches and to evaluate the effectiveness of the proposed method on account of on-site/true measures.
The metric equations in Table 4 show us the universal error index applied to most forecasting models for renewables.The mean absolute error (MAE), the root mean square error (RMSE) [57], and the mean absolute percentage error (MAPE) are used to estimate the forecasting effectiveness of the proposed method.They are denoted as follows: t p and ˆt p are the true value and the predicted value, respectively.T is the total number of elements in this data array.The MAE depends on t p and ˆt p , the RMSE depends on t p and ˆt p , and furthermore, the MAPE gives the relative error between t t p p − and t p .Quantified by these three frequently used indices, we can clearly and concisely perceive the difference between the predicted and exact wind speed values.A smaller difference value indicates that the forecasting method has a better performance.Nevertheless, MAPE, a unit-free estimator, has better sensitivity for small-scale variation, does not reveal some weak characteristics of data, such as asymmetry, and has lower abnormal value protection.Therefore, a better MAPE will be chosen as the standard in this paper.

Forecast Error Metrics
Forecasting errors are applied to assess the ability of the applied forecasting approaches and to evaluate the effectiveness of the proposed method on account of on-site/true measures.
The metric equations in Table 4 show us the universal error index applied to most forecasting models for renewables.The mean absolute error (MAE), the root mean square error (RMSE) [57], and the mean absolute percentage error (MAPE) are used to estimate the forecasting effectiveness of the proposed method.They are denoted as follows: p t and pt are the true value and the predicted value, respectively.T is the total number of elements in this data array.The MAE depends on p t and pt , the RMSE depends on p t and pt , and furthermore, the MAPE gives the relative error between |p t − pt | and p t .Quantified by these three frequently used indices, we can clearly and concisely perceive the difference between the predicted and exact wind speed values.A smaller difference value indicates that the forecasting method has a better performance.Nevertheless, MAPE, a unit-free estimator, has better sensitivity for small-scale variation, does not reveal some weak characteristics of data, such as asymmetry, and has lower abnormal value protection.Therefore, a better MAPE will be chosen as the standard in this paper.

MAE
The mean absolute error of T times predictive results

Comparison Method and Its Corresponding Results
Our main contribution is not only to provide an optimization algorithm, but also to propose a novel hybrid wind speed forecasting model.Experimental results prove that the proposed method can be perfectly used for short-term wind speed forecasting, and that it has considerable practical value and strong operability in wind farms and grid management.In addition, we performed a comparative experiment to compare the proposed model with other forecasting approaches, and the corresponding results are presented in Table 5, revealing that the proposed hybrid method achieves higher forecasting accuracy than the other methods.

Case Studies
This study consists of three classic experiments.Each of them is grouped into two sections, one of which uses the primary wind speed data and the other applies data preprocessed by the SSA approach.Original data are forecast by the single BP algorithm, and the BP algorithm is optimized using combined BFGS-FA (BFGS-FA-BP); decomposed data are also predicted by the single BP and BFGS-FA (BFGS-FA-BP).
The two model aim to compare the single BP with the optimized BFGS-FA-BP to determine the performance of the hybrid method.The parameters of SSA are presented in Table 6.

Embedding dimension 50 Components 10
Method of calculating the covariance matrix Unbiased (N-K weighted) Biased (N-weighted or Yule-Walker) BK(Broomhead/King type estimate)

Case Study 1
In this section, all results will be clearly demonstrated in the figures and tables to reveal the effectiveness of each model.First, the predicted values of wind speed in the three locations are presented in Tables 7-9.Considering the random disturbances of the forecasts, it is necessary to repeat each experiment many times to ensure the reliability of results.Therefore, in our study, we performed each experiment 20 times and then used the average values as the final results, to make sure that the results are dependable.
Figure 4a shows the data of the estimated performance with and without using the SSA approach at site 1.The effectiveness of the experiment that used real values is displayed at the top left side of the chart, and was forecast by models without using the SSA approach.The effectiveness of the experiment that used processed data is described at the bottom left side of the Figure, and was forecast by models using the SSA approach.We can conclude from the Figure that the SSA-BP model predicts values close to the true values, especially SSA-BFGS-FA-BP.In other words, the experiment that used processed data showed better performance than the other.
Figure 4b shows the difference between the forecast values and the exact values collected from site 1, and their corresponding errors.We can clearly see that the error of the SSA-BFGS-FA-BP model is much lower than that of the BP, BFGS-FA-BP, and SSA-BP models, which implies that the SSA-BFGS-FA-BP method performs much better than other models.
Energies 2017, 10, 954 15 of 29 left side of the chart, and was forecast by models without using the SSA approach.The effectiveness of the experiment that used processed data is described at the bottom left side of the Figure, and was forecast by models using the SSA approach.We can conclude from the Figure that the SSA-BP model predicts values close to the true values, especially SSA-BFGS-FA-BP.In other words, the experiment that used processed data showed better performance than the other.
Figure 4b shows the difference between the forecast values and the exact values collected from site 1, and their corresponding errors.We can clearly see that the error of the SSA-BFGS-FA-BP model is much lower than that of the BP, BFGS-FA-BP, and SSA-BP models, which implies that the SSA-BFGS-FA-BP method performs much better than other models.

Results of Analysis
In this section, another two samples are described in Figures 5a and 6a, whose predictive performance with and without using the SSA method are compared.Furthermore, Figures 5b and 6b illustrate the difference between the forecast values and the real values collected from the other two sites and their corresponding errors.
Similar to what was described above, and as shown in Figures 5a and 6a, the BP and BFGS-FA-BP models closely approached the actual values, but the SSA-BP model and especially the SSA-BFGS-FA-BP model performed much better in forecasting.Therefore, we can conclude that the experiment using the SSA approach outperforms the other.As revealed in Figures 5b and 6b, the deviations of the SSA-BFGS-FA-BP model are much smaller than those of the BP, BFGS-FA-BP, and SSA-BP models.In particular, it is very clear that the SSA-BFGS-FA-BP model gets extremely close to the exact wind speed, and has higher performance than the other three models.
To test the accuracy of the experiment and guarantee the practicability and feasibility of the developed method, we performed another three experiments.As shown in Table 10, data were taken during four seasons (spring, summer, autumn, and winter) from a fixed location to verify the stability

Results of Analysis
In this section, another two samples are described in Figures 5a and 6a, whose predictive performance with and without using the SSA method are compared.Furthermore, Figures 5b and 6b illustrate the difference between the forecast values and the real values collected from the other two sites and their corresponding errors.
Similar to what was described above, and as shown in Figures 5a and 6a, the BP and BFGS-FA-BP models closely approached the actual values, but the SSA-BP model and especially the SSA-BFGS-FA-BP model performed much better in forecasting.Therefore, we can conclude that the experiment using the SSA approach outperforms the other.As revealed in Figures 5b and 6b, the deviations of the SSA-BFGS-FA-BP model are much smaller than those of the BP, BFGS-FA-BP, and SSA-BP models.In particular, it is very clear that the SSA-BFGS-FA-BP model gets extremely close to the exact wind speed, and has higher performance than the other three models.
To test the accuracy of the experiment and guarantee the practicability and feasibility of the developed method, we performed another three experiments.As shown in Table 10, data were taken during four seasons (spring, summer, autumn, and winter) from a fixed location to verify the stability of the model.The results presented in this indicate that (1) the variance of the proposed hybrid SSA-BFGS-FA-BP method is minimal; and (2) the predicted value of the hybrid SSA-BFGS-FA-BP model is closer to the true value, having higher stability than the other three models.
In another experiment, the new predicted value based on historical data was used as the new real value to test.Using this, we performed three-step iterative, six-step iterative, and twelve-step iterative experiments.The final experimental results are shown in Table 10.The results indicate that the variance of the hybrid SSA-BFGS-FA-BP model is smaller, and that the predicted value within six steps of hybrid SSA-BFGS-FA-BP model is closer to the true value, having higher stability than the other three models.However, in the 12-step iterative experiment, the proposed SSA-BFGS-FA-BP model did not show better accuracy than the other three models.This indicates that the optimal results will be worse with an increase in the number of iterations beyond a certain extent, because of the increase in randomness.This also verifies that, with an increase in the number of iterations, the accuracy of prediction is low and the deviation is high.
Finally, we collected data at different time intervals (10, 30 and 60-min intervals) from a fixed location to conduct an experiment, and the results of the experiment are shown in Table 11.We can conclude from the table that the effects of the optimization method will run into a bottleneck when the time interval of the data becomes too great.The hybrid SSA-BFGS-FA-BP model did not show a better performance.The error of forecasting reached a high value when the time interval was so long.
In conclusion, the SSA-BFGS-FA-BP has much higher effectiveness than the single BP, BFGS-FA-BP, and SSA-BP models.We can confirm that the hybrid SSA-BFGS-FA-BP model can make a more accurate prediction on account of the original time sequence.
Energies 2017, 10, 954 16 of 29 of the model.The results presented in this indicate that (1) the variance of the proposed hybrid SSA-BFGS-FA-BP method is minimal; and (2) the predicted value of the hybrid SSA-BFGS-FA-BP model is closer to the true value, having higher stability than the other three models.
In another experiment, the new predicted value based on historical data was used as the new real value to test.Using this, we performed three-step iterative, six-step iterative, and twelve-step iterative experiments.The final experimental results are shown in Table 10.The results indicate that the variance of the hybrid SSA-BFGS-FA-BP model is smaller, and that the predicted value within six steps of hybrid SSA-BFGS-FA-BP model is closer to the true value, having higher stability than the other three models.However, in the 12-step iterative experiment, the proposed SSA-BFGS-FA-BP model did not show better accuracy than the other three models.This indicates that the optimal results will be worse with an increase in the number of iterations beyond a certain extent, because of the increase in randomness.This also verifies that, with an increase in the number of iterations, the accuracy of prediction is low and the deviation is high.
Finally, we collected data at different time intervals (10, 30 and 60-min intervals) from a fixed location to conduct an experiment, and the results of the experiment are shown in Table 11.We can conclude from the table that the effects of the optimization method will run into a bottleneck when the time interval of the data becomes too great.The hybrid SSA-BFGS-FA-BP model did not show a better performance.The error of forecasting reached a high value when the time interval was so long.
In conclusion, the SSA-BFGS-FA-BP has much higher effectiveness than the single BP, BFGS-FA-BP, and SSA-BP models.We can confirm that the hybrid SSA-BFGS-FA-BP model can make a more accurate prediction on account of the original time sequence.

The Results of the DM Test
The DM test was employed to verify the levels of accuracy forecasted by the proposed hybrid method and the other three single models.Table 12 shows that the values of the DM statistics between the proposed hybrid model and the BP, SSA-BP and BFGS-FA-BP models are 8.1064, 8.0468 and 8.1696, respectively.Under a 1% confidence level, the upper limit value is much smaller than these DM statistics; therefore, we cannot accept the null hypothesis and we have to admit the alternative hypothesis.Thus, we can conclude that the hybrid method outperforms the other single methods.

Remark:
We could learn from the results in terms of estimations on the basis of the DM test that the novel hybrid method achieves a more precise and stable prediction capacity than the other three models, and that the forecasting effectiveness of the hybrid method differs from that of the BP, SSA-BP, and BFGS-FA-BP models.

The Results of the DM Test
The DM test was employed to verify the levels of accuracy forecasted by the proposed hybrid method and the other three single models.Table 12 shows that the values of the DM statistics between the proposed hybrid model and the BP, SSA-BP and BFGS-FA-BP models are 8.1064, 8.0468 and 8.1696, respectively.Under a 1% confidence level, the upper limit value is much smaller than these DM statistics; therefore, we cannot accept the null hypothesis and we have to admit the alternative hypothesis.Thus, we can conclude that the hybrid method outperforms the other single methods.
Remark.We could learn from the results in terms of estimations on the basis of the DM test that the novel hybrid method achieves a more precise and stable prediction capacity than the other three models, and that the forecasting effectiveness of the hybrid method differs from that of the BP, SSA-BP, and BFGS-FA-BP models.

Discussion
In this section, we initially discuss the application of SSA in preprocessing the original data, which influences the forecasting performance.We also examine the MAPEs of decreased relative percentage (DRP) between the proposed model and other forecasting approaches.Furthermore, we present and discuss variations of the data selection.

Data Pre-Processing
In general, plenty of noise and high-frequency time series lie in the raw wind speed time sequence.Therefore, the decomposition of the original data sequence is a significant process in data filtering.This can always effectively enhance the prediction accuracy of the model to obtain better forecast results.Through the comparison between BP and SSA-BP, we can assess the effectiveness of the data pre-processing using a new metric called DRP (%), and its corresponding defining equation is summarized as follows: The experimental results show that the method significantly enhances the forecasting effectiveness: it decreases the MAPE by 32.8%, 29.3% and 30.7% for site 1, site 2 and site 3, respectively.

Neural Networks
In the field of practical engineering, the quality of a model depends on its effectiveness, rather than its complexity.However, the question of how to seek an effective forecasting method to enhance performance is not only a problem that is in urgent need of a solution, but also a critical problem in the field of forecasting.The relevant study [58] showed that there was no one unified model for forecasting time series, and model effectiveness under different circumstances should be analyzed and understood, with incremental improvements being made on the basis of the knowledge gained; therefore, it is impossible to find one model to solve all forecasting problems.Thus, our attention should be more focused on the DRP of the error forecast by different approaches using different data sets, in order to find a relatively good model for forecasting wind speed time series.Through analyzing the difference between the proposed model and the other comparative models, we can find that the proposed model improves effectiveness by 38.4654%, 35.5391 and 37.9645% for site 1, site 2 and site 3, respectively.The detailed results are presented in Table 13.From the Table, we can see that the developed method has a very good performance in decreasing wind speed forecasting error.

Data Selection
According to the forecast results, the 10-min interval data sequence achieves the best forecasting effectiveness for all three observation sites, with an MAPE of approximately 6%; therefore, the proposed hybrid model shows excellent performance in forecasting the wind speed time sequence at 10-min intervals.The 10-min interval time series at each observation site decreases the forecasting error by 12.59%, 10.14% and 11.41%, respectively.
For the time series data with a 60-min interval, the forecast results are good for all three observation sites, while the forecasting performance is worse than for the data with a 10-min interval.Therefore, the SSA-BFGS-FA-BP is more applicable to forecasting the wind speed time sequence with a 10-min interval, and the data selection will have a serious effect on forecasting effectiveness.However, regardless of the time interval, the forecasting effectiveness is in an acceptable range.Many works apply the wind speed series with time resolutions including 10, 30 and 60 min for the purpose of the forecast, which is representative for studying wind speed forecasting.The detailed comparison results are presented in Table 14.

Conclusions and Future Work
As a kind of non-polluting and renewable energy source, wind energy has been increasingly applied in the development of industry and agriculture, and its forecasting is becoming increasingly important for wind farms.Recently, academia and wind farm projects have been gradually paying more attention to wind speed forecasting.Perfect prediction can not only reduce costs and enhance personal safety, but also help wind farm management develop more effective programs.The accuracy of a model is as important as its stability in forecasting.It is of great interest to propose an outstanding method for wind speed prediction with high accuracy and long-term stability.Nevertheless, wind speed prediction has been generally considered a challenging task in terms of the effects of various intangible factors, such as temperature, location, tides, atmospheric pressure, and other factors.In this paper, to overcome these difficulties, a hybrid model that combines the SSA approach, BFGS-FA algorithm, and BP method is presented.
The results based on evaluation criteria such as the MAE, RMSE, MAPE and a statistical test are shown in a sequence of charts, in which the superior qualities of the developed hybrid method are revealed most vividly.From the data in the tables and figures, we can draw the conclusion that the proposed hybrid method achieves the best forecasting effectiveness and a higher stability and reliability.
SSA is a practical decomposition approach, which can remove the noise from the raw data, leaving the principal component for forecasting.The BP model, based on feed forward neural networks, has increasingly turned into a fairly distinguished tool.It is shown that the BP model can get its final predictive results in a remarkably short time.
In brief, the hybrid model always has the lowest MAPE value compared with other single forecasting methods, which implies that the hybrid method has the best performance and higher reliability.Improvements in forecasting accuracy and stability can not only help to save large amounts of energy and money, but also help to reduce the time the system requires.The experiments performed in the present study show that the developed hybrid method is a potential algorithm with high accuracy.In addition, the hybrid method could be applied to other fields of practical engineering, such as electric load forecasting, stock price prediction, and solar resource forecasting.

Output:
x b −the corresponding value of x when it acquires the optimal fitness among all fireflies.

Parameters:
Gen max -the max iterative times.n-the total number of fireflies.F p -the fitness function according to firefly p. x p -nest p. g-the present number of iterations.L p -the brightness of firefly p. d-the dimension of the parameter.FOR q = 1:n RUN 13: /* Adjust the firefly from p to q in any direction.*/14: IF (L q > L p ) WELL 15: r pq = x p − x q = ∑ d t=1 (x p,t − x q,t ) 2 16: x p = x p + β 0 e −γr 2 (x q − x p ) + α(rand − 0.5) 17: END 18: Attraction changes with the distance r via e −r

Parameters:
Gen max -the max iterative times.n-the total number of fireflies.FOR q = 1:n RUN 13: /* Adjust the firefly from p to q in any direction */ 14: IF(L q > L p ) WELL 15: r pq = x p − x q = ∑ d t=1 (x p,t − x q,t ) 2 16: x p = x p + β 0 e −γr 2 (x q − x p ) + α(rand − 0.5) 17: END 18: Attraction changes with the distance r via e −r 2 19: Apply BFGS to help to renew the new site of fireflies x p (p = 1, 2, . . ., n) quickly.

Figure 1 .
Figure 1.Topology structure and flow chart of BP neural network and the flow chart and structure of the combined BFGS-FA Algorithm.

Figure 1 .
Figure 1.Topology structure and flow chart of BP neural network and the flow chart and structure of the combined BFGS-FA Algorithm.

Figure 2 .
Figure 2. The flow chart of the hybrid SSA-BFGS-FA-BP model.

Figure 2 .
Figure 2. The flow chart of the hybrid SSA-BFGS-FA-BP model.

Figure 3 .
Figure 3. Geographical position of the survey regions and actual values of three stations.

Figure 3 .
Figure 3. Geographical position of the survey regions and actual values of three stations.

Figure 4 .
Figure 4.The forecast results for wind speed collected from site 1 at 10-min intervals.(a) The forecast effect without SSA algorithm and with SSA algorithm; (b) The comparison between the forecast values and raw data and their corresponding errors.

Figure 4 .
Figure 4.The forecast results for wind speed collected from site 1 at 10-min intervals.(a) The forecast effect without SSA algorithm and with SSA algorithm; (b) The comparison between the forecast values and raw data and their corresponding errors.

Figure 5 .
Figure 5.The forecast results for wind speed collected from site 2 at 10-min intervals.(a) The forecast effect without SSA algorithm and with SSA algorithm; (b) The comparison between the forecast values and the raw data and their corresponding errors.

Figure 5 .
Figure 5.The forecast results for wind speed collected from site 2 at 10-min intervals.(a) The forecast effect without SSA algorithm and with SSA algorithm; (b) The comparison between the forecast values and the raw data and their corresponding errors.

Figure 6 .
Figure 6.The forecast results for wind speed collected from site 3 at 10-min intervals.(a) The forecast effect without SSA algorithm and with SSA algorithm; (b) The comparison between the forecast values and the raw data and their corresponding errors.

Figure 6 .
Figure 6.The forecast results for wind speed collected from site 3 at 10-min intervals.(a) The forecast effect without SSA algorithm and with SSA algorithm; (b) The comparison between the forecast values and the raw data and their corresponding errors.
all the parameters related to FA.*/ 2: /* Initialize the species of fireflies x p (p = 1, 2, . . ., n) at random.Confirm the brightness L p through F(x p ) 9: END 10: WHILE (g < Gen max ) RUN 11: FOR p = 1:n RUN 12: the best nest x m of the d generation* F p -the fitness function according to firefly p. x p -nest p. g-the present number of iterations.L p -the brightness of firefly p. d-the dimension of the parameter.1: /*Define all the parameters related to FA and BFGS.*/ 2: /* Initialize population of n fireflies x p (p = 1, 2, . . ., n) at random.Confirm brightness L p through F(x p ) 9: END 10: WHILE (g < Gen max ) RUN 11: FOR p = 1:n RUN 12:

Table 1 .
Summary of intelligent and hybrid forecasting methods and theoretical comparison of existing forecasting models.RBF: radial basis function; ENN: Elman neural network; WNN: Wavelet neural network; GA: genetic algorithm; BP: back propagation; FA: firefly algorithm; PSO: particle swarm optimization.

Table 2 .
The experimental set points of BP.

Table 3 .
The experimental set points of FA.

Table 4 .
The metric equations.

Table 5 .
The results of the hybrid model, ARIMA and SVM model at three sites.

Table 7 .
MAE, RMSE, and MAPE of site 1 for each forecasting model.

Table 10 .
MAE, RMSE, and MAPE of four seasons and various iterations in one site for each forecasting model.

Table 11 .
MAE, RMSE, and MAPE of various intervals in one site for each forecasting model.

Table 12 .
Results of the DM test and operational time (s).

Table 13 .
The DRP of MAPE of the proposed model and comparison models.

Table 14 .
Comparison results of three observation sites with different time intervals.
* Compute step size.*/ *Assess the new position and renew the new light intensity L p .*/