A New Wind Speed Forecasting Modeling Strategy Using Two-Stage Decomposition , Feature Selection and DAWNN

Accurate wind speed prediction plays a crucial role on the routine operational management of wind farms. However, the irregular characteristics of wind speed time series makes it hard to predict accurately. This study develops a novel forecasting strategy for multi-step wind speed forecasting (WSF) and illustrates its effectiveness. During the WSF process, a two-stage signal decomposition method combining ensemble empirical mode decomposition (EEMD) and variational mode decomposition (VMD) is exploited to decompose the empirical wind speed data. The EEMD algorithm is firstly employed to disassemble wind speed data into several intrinsic mode function (IMFs) and one residual (Res). The highest frequency component, IMF1, obtained by EEMD is further disassembled into different modes by the VMD algorithm. Then, feature selection is applied to eliminate the illusive components in the input-matrix predetermined by partial autocorrelation function (PACF) and the parameters in the proposed wavelet neural network (WNN) model are optimized for improving the forecasting performance, which are realized by hybrid backtracking search optimization algorithm (HBSA) integrating binary-valued BSA (BBSA) with real-valued BSA (RBSA), simultaneously. Combinations of Morlet function and Mexican hat function by weighted coefficient are constructed as activation functions for WNN, namely DAWNN, to enhance its regression performance. In the end, the final WSF values are obtained by assembling the prediction results of each decomposed components. Two sets of actual wind speed data are applied to evaluate and analyze the proposed forecasting strategy. Forecasting results, comparisons, and analysis illustrate that the proposed EEMD/VMD-HSBA-DAWNN is an effective model when employed in multi-step WSF.


Introduction
Sustainability transitions are beneficial transformation processes that make social production and consumption sustainable through socio-technical systems.Major power structural changes from the fossil-fuel energy systems to low-carbon and renewable energy supply are one vital aspect of the sustainability transitions [1,2].The large-scale green renewable low-carbon wind resources that support the sustainable development of energy systems has attracted worldwide attention.The exploration and utilizations of wind power have large potentials to reduce large greenhouse gas emissions and promote sustainable economic development.However, large-scale integration of wind power into power grid greatly influences the reliability, safety, and power quality of the power system for the stochastic and fluctuate characteristics of wind speed [3].As wind speed is one important factor for wind power generation, precise WSF can eliminate influences, therefore, accurate WSF techniques are critical issues for the wind farms [4].
Over the past decade, more reliable and accurate WSF approaches have been developed to solve these problems of the inherent volatility in wind speed time series.There are three main categories of models for short-term WSF, i.e., statistical approaches, artificial intelligent (AI) methods, and hybrid forecasting models.
The random and irregular nature of wind speed make the AI approaches more suitable for WSF than statistical approaches.However, these pure forecasting models suffer from large errors regardless of statistical models or AI approaches when they make WSF directly without pre-processing wind speed data.To enhance the forecasting accuracy, the hybrid methods taking advantages of individual merits have drawn widespread attentions for time series-based WSF.Hu et al. [17] employed decomposition-based forecasting approach combining empirical wavelet transform (EWT) with LSSVM tuned by coupled simulated annealing (CSA).Meng et al. [18] presented a new hybrid model using wavelet packet decomposition (WPD) and BPNN tuned by crisscross optimization algorithm for multi-step WSF.Wang et al. [19] examined the forecasting performance of a hybrid WSF model using WNN tuned by genetic algorithm (GA) and variational mode decomposition (VMD).These hybrid forecasting models firstly utilize single signal decomposition technique to break the original empirical wind speed into different sub-series, then, the sub-series variables are employed as the inputs of AI methods for WSF.To avoid excessive manual intervention and improve adaptivity, intelligent optimization algorithms are applied to tune the parameters combination of AI methods.Seen from the experimental results in the previous literatures, hybrid forecasting models appear to be obviously more precise as compared with single forecasting models.
The previous literatures illustrate that AI models combined with multi-scale decomposition techniques achieve satisfactory forecasting results, however, single signal decomposition methods cannot often thoroughly tackle the non-stationary and non-linear components in the wind speed.Thus, there still exists much room for improvement in wind speed pre-processing [20].Sun et al. [21] proposed an improved EEMD with a BPNN model for short-term WSF.The hybrid EEMD-SVM model proposed by Hu et al. [12] discarded the highest frequency IMF1 directly to reduce the influence of the most disorder and highest frequency components on the forecasting accuracy.Yu et al. [11] developed three hybrid models, namely EMD-singular spectrum analysis (SSA)-ELMNN, EEMD-SSA-ELMNN, and CEEMDAN-SSA-ELMNN, for 1 h average WSF.In these forecasting models, the SSA approach is applied to extract the trend of the most irregular component IMF1.The retreatment of IMF1 by SSA method can significantly improve the forecasting performance.Peng et al. [22] proposed a novel hybrid model combining AdaBoost-ELM with two-stage decomposition approach for multi-step ahead WSF.In the two-stage decomposition process, the complementary ensemble empirical mode decomposition with adaptive noise (CEEMDAN) is firstly used to break the empirical wind speed data into IMFs and one Res, then, the most nonstationary IMF1 is further decomposed by VMD technique.In a similar way, Yin et al. [23] presented an effective secondary decomposition technique to eliminate the nonstationary characteristics in the samples.The forecasting results of different Cases show that Energies 2019, 12, 334 3 of 24 the hybrid models with two-stage decomposition or secondary decomposition technique can yield higher accuracy.
As stated by [24,25], not all input candidates promote the final forecasting results positively and there exists some illusive components generated by the decomposition technique or measurement error.Energy measurement and kernel density estimation-based Kullback-Leibler divergence are exploited by Jiang et al. [26] as a feature selection technique to identify the effective candidates so as to eliminate negative influence from the illusive components.Salcedo-Sanz et al. [24] developed coral reefs optimization as a feature selection technique to select the effective variables from the total empirical samples for reducing the input dimension number.Apart from the feature selection technique and parameter optimization method alone for the forecasting engine to enhance the prediction accuracy, these functions can be realized simultaneously through the hybrid optimization algorithm.For example, a hybrid gravitational search algorithm (HGSA) integrating real-value GSA (RGSA) with binary-value GSA (BGSA) is exploited by Luo et al. [27] to undertake feature selection and parameter optimization, thereinto, exploiting the BGSA to identify the vital feature variables from the total compound feature selection, and the RGSA algorithm is applied to tune the parameter combination in ELM.
Inspired by the forecasting mechanism in the previous literatures, a novel combined strategy using two-stage decomposition technique, hybrid backtracking search optimization algorithm, WNN with double activations through weighted coefficient (DAWNN), namely EEMD/VMD-HBSA-DAWNN, is proposed for short-term WSF.The proposed model is trained and tested using two sets of randomly selected wind speed data from a wind farm of Anhui in China.With respect to similar studies in WSF, the main work and contributions of this article are illustrated as follows.

•
The proposed forecasting strategy takes advantage of individual methods, including two-stage decomposition technique, HBSA, DAWNN, that can enhance forecasting accuracy.The proposal not only thoroughly tackles the nonstationary characteristics of wind speed data, but also remedies the deficiencies of the AI approach.

•
To decompose thoroughly, a two-stage decomposition technique combining EEMD with VMD is exploited to deal with wind speed data, eliminating the characteristic of irregularities.

•
To better improve the regression performance of WNN, double activation functions combining the Morlet function with the Mexican hat function by weighted coefficients, namely DAWNN, are proposed.

•
As stated by Zheng et al. [28], there are ineffective candidate features in wind speed time series that generate negative influence on forecasting results, and WNN with inappropriate parameters tends to get stuck in over-fitting or under-fitting easily.The parameters of WNN should be tuned for the improvement of forecasting accuracy.Thus, the BBSA algorithm is applied to remove ineffective candidate variables, while RBSA is used to tune the input weights, output weights, and weighted coefficients in DAWNN other than random selection for the proposed forecasting engine.The feature selection technique and parameter optimization are realized by HBSA algorithm.
The remaining parts of this article are arranged as follows.In Section 2, the detailed WSF strategy are introduced.In Section 3, two-stage wind speed decomposition technique, HBSA algorithm and DAWNN are presented.The statistical error evaluation indices and the procedure of model construction and development are presented in the Section 4. Numerical results and discussion are illustrated in Section 5.In the end, the conclusions are drawn.The abbreviations of technical terms are listed in Appendix A.

The Proposed WSF Strategy
The overall working process of the proposed WSF model depicts in Figure 1.The proposed hybrid model EEMD/VMD-HBSA-DAWNN makes WSF in following steps.

•
Step 1: In this step, the original empirical wind speeds are decomposed into several modes, different IMFs and a Res using two-stage decomposition technique EEMD/VMD to eliminate the irregular and fluctuant characteristics of wind speed for better forecasting.Firstly, the original empirical samples are broken into different IMFs and one Res, then, the IMF1 generated by EEMD is further decomposed by VMD approach into several modes.

•
Step 2: Considering the positive influences of normalization on the performance of WNN, the input variables are linearly normalized into interval [0, 1].Prior to WSF using the proposed DAWNN method, partial autocorrelation function (PACF) values are applied to determine the correlation coefficients among the input candidates for establishing the training and testing input-matrix.

EEMD
A novel noise-assisted signal processing technology EEMD was developed by Wu et.al [29] for eliminating the mode mixing shortage of EMD.As stated by Jiang [26], EEMD is an effective adaptive signals analysis approach which decomposes and analyzes nonlinear signals by an iterative sifting process.In each sifting process, a finite amplitude Gaussian white noise is added into the decomposed signal and the white noise can be removed through averaging.For wind speed samples x(t), the main signal analysis process of EEMD is illustrated as follows.

EEMD
A novel noise-assisted signal processing technology EEMD was developed by Wu et.al [29] for eliminating the mode mixing shortage of EMD.As stated by Jiang [26], EEMD is an effective adaptive signals analysis approach which decomposes and analyzes nonlinear signals by an iterative sifting process.In each sifting process, a finite amplitude Gaussian white noise is added into the decomposed Energies 2019, 12, 334 5 of 24 signal and the white noise can be removed through averaging.For wind speed samples x(t), the main signal analysis process of EEMD is illustrated as follows.

•
Step 1: A Gaussian white noise is added into the empirical wind speed data, and new signal is generated as Equation (1), where n(t) stands for the Gaussian signal.

•
Step 2: Disassemble the new signal into N different amplitude-frequency IMFs and a Res by EMD algorithm, then, the X(t) can be expressed as Equation (2), where N stands for the decomposed quantity.

•
Step 3: Repeat Step 1 and 2 by adding different Gaussian white noise at each sifting process.

•
Step 4: Eliminate the Gaussian white noise and obtain the final IMFs by averaging all the corresponding IMFs in the end.

VMD
A novel signal analysis approach VMD is proposed by Dragomiretskiy and Zosso [30] in 2014 to break down real-value signals into different models with specific sparsity properties.The decomposition process is carried out through solving the minimum value optimization problem, which is expressed in Equation (3), where f (t) and u k stand for the original signal and its kth component, respectively.δ(t), ⊗, ω k and k denote the Dirac distribution, convolution operator, the center frequency of u k , and the number of modes, respectively.To simplify the above constraint optimization problem, Lagrangian multipliers and quadratic penalty term are utilized to construct Lagrangian function expressed in Equation ( 4), where α and λ stand for the balancing value of the data-fidelity constraint and Lagrange multiplier, respectively.To address the optimization, the Alternate Direction Method of Multiplier (ADMM) was developed to obtain the saddle point of the augmented Lagrangian function during the iterative sub-optimization.As results, the solutions of Equation ( 4) can be obtained by Equations ( 5) and ( 6): where ûn+1  [31], is applied to determine the optimal solution of the objective function, which is used to improve the performance of system through the input information and optimization condition.In the optimization process, BSA exhibits good exploitation capabilities and robust exploration to find the best values of the population using mutation and crossover operators.BSA has only one parameter that needs to be tuned, therefore, it has been widely applied to deal with the real-valued optimization [32,33].The optimization process of BSA is generally summarized as the following steps.

•
Step 1: Initialization Not only the current population but also the historical population of BSA are generated randomly with uniform distribution according to Equations ( 7) and (8).
where P i,j and oldP i,j are current population and history population, respectively.low j and up j are the minimal and maximal variables, respectively.U stands for uniform distribution.

•
Step 2: Selection-I According to Equation ( 9), the historical population oldP is redefined through comparing two random variables a and b.Then, the orders of the individuals are changed through a random shuffle function expressed as Equation (10).oldP remembers a randomly selected previous generation to produce a new trial population by utilizing the prior experience to determine the search direction matrix.

•
Step 3: Mutation BSA's mutation mechanism is mathematically expressed as Equation (11).As seen from the formula, BSA utilizes the information of the historical population to create a mutation population and the previous generation penetrate the whole mutation process.The parameter F in Equation ( 11) is used to adjust the amplitude of the search direction matrix (oldP-P), where rndn~U(0,1).

•
Step 4: Crossover Although BSA is improved and developed on basis of differential evolution (DE), the crossover process of BSA algorithm is quite different from DE algorithm.Two predefined strategies are used to determine the matrix map of BSA.One strategy utilizes the parameter mixrate to generate a binary integer-valued matrix, as shown in Equation (12), which guides the crossover direction.
where rand1 and rand2 are random values, D stands for the population dimension.
The second strategy updates the trial population T. If map i,j = 1, T i,j is replaced by P i,j , thus, the final crossover results are shown in Equation ( 13), where map stands for binary integer-valued matrix.

•
Step 5: Selection-II After calculation of the fitness values of each individual, the comparisons between the fitness values of the original population and the trial population are carried out, and T i with better fitness values will update P i .Step 2-5 are repeated until the terminal conditions are satisfied.In the end, the population P with the best solution is determined and denoted as P best .Here, the proposed DAWNN with two weighted activation functions is trained and tested, i.e., the free parameters and the weighted coefficients of DAWNN are tuned by RBSA, the output errors of DAWNN are utilized as the training objective function of RBSA.

BBSA
RBSA is utilized to solve the real-valued optimization problems in the continuous domain, however, there are many binary-valued discrete parameter problems that need to be dealt with.To solve these binary-valued problems, Ahmed developed a BBSA technique [33].The BBSA is applied in the similar way to convert the position to 0 or 1 by sigmoid function, as shown in Equations ( 14) and (15).
where S i,j , ω and B i,j is the sigmoid function, population value and binary value, respectively.B i,j is 1 when the S i value is bigger than or equal to 0.5, otherwise the B i,j is 0. In the WSF model, the input variables of the proposed DAWNN are encoded as binary-valued vector, 1 means "selected" while 0 stands for "discard".

WNN
Wavelet Transform (WT) has been widely utilized in wind speed/power prediction [34].One popular method is to employ WT as a preprocessor to decompose the historical wind speed or power data into different relatively stable sub-series which are used as input variables of intelligent methods for further regression [13,34].Wavelet functions are used in the hidden layer of ANN model to construct WNN, which is another popular application of WT [13,35].The basic structure of a WNN are shown in Figure 2, which contains one input layer, one hidden layer and one output layer.The algebraic representations of the WNN is expressed as Equation (16).
where a i,j and b i,j stands for scale factor and position factor, respectively.
power data into different relatively stable sub-series which are used as input variables of intelligent methods for further regression [13,34].Wavelet functions are used in the hidden layer of ANN model to construct WNN, which is another popular application of WT [13,35].The basic structure of a WNN are shown in Figure 2, which contains one input layer, one hidden layer and one output layer.The algebraic representations of the WNN is expressed as Equation ( 16). , , , , ( ) where ai,j and bi,j stands for scale factor and position factor, respectively.Morlet and Mexican hat functions, expressed as Equations ( 17) and ( 18), respectively, are typical wavelet functions used in the WNN method, which can be explained by Figure 3.For instance, in Ref. [35], WNN with Mexican hat wavelet activation function combining with multi-resolution analysis method was developed for short-term WSF.It is mentioned in Ref. [13] that WNN with Morlet wavelet function can yield better forecasting performance than that with Mexican hat wavelet function.It is obviously seen that Morlet wavelet function has more vanishing mean oscillatory character with diverse oscillations than Mexican hat function, thus, Morlet wavelet function has better capacity in localizing various high frequency elements in the time domain of severely irregular nonlinear signals [13].Mexican hat function is obtained by second derivative of Gaussian function and Morlet wavelet function is the single frequency sub-sine function under Gaussian envelope.In this study, we take advantages of individuals of Morlet and Mexican hat wavelet functions to construct a hybrid activation function by weighted coefficient for WNN, which is expressed as Equation (19).
Energies 2019, 12, x FOR PEER REVIEW 8 of 24 Morlet and Mexican hat functions, expressed as Equations ( 17) and ( 18), respectively, are typical wavelet functions used in the WNN method, which can be explained by Figure 3.For instance, in Ref. [35], WNN with Mexican hat wavelet activation function combining with multi-resolution analysis method was developed for short-term WSF.It is mentioned in Ref. [13] that WNN with Morlet wavelet function can yield better forecasting performance than that with Mexican hat wavelet function.It is obviously seen that Morlet wavelet function has more vanishing mean oscillatory character with diverse oscillations than Mexican hat function, thus, Morlet wavelet function has better capacity in localizing various high frequency elements in the time domain of severely irregular nonlinear signals [13].Mexican hat function is obtained by second derivative of Gaussian function and Morlet wavelet function is the single frequency sub-sine function under Gaussian envelope.In this study, we take advantages of individuals of Morlet and Mexican hat wavelet functions to construct a hybrid activation function by weighted coefficient for WNN, which is expressed as Equation ( 19).
The vector of the real-valued parameter of the proposed DAWNN is expressed as Equation (20).Therefore, there are 5 n real-valued values to be tuned by the HBSA.[ , , , , , , , , , , , , , , , , , , , ] where ωi, ωo and μ are the input weighted coefficient, the output weighted coefficient, and weighted coefficient, respectively.a and b are the same definition as the Equation ( 16).represent the historical wind speed data and y is the corresponding forecasting value.The y is the output value obtained by a non-linearity autoregressive process expressing in the Equation (21).In  The vector of the real-valued parameter of the proposed DAWNN is expressed as Equation (20).Therefore, there are 5 n real-valued values to be tuned by the HBSA.
where ω i , ω o and µ are the input weighted coefficient, the output weighted coefficient, and weighted coefficient, respectively.a and b are the same definition as the Equation ( 16).The future wind speed value at time step k is obtained by WNN through regressive mapping the historical measured wind speed data.In the proposed model, the input vector [x 1 , x 2, . . ., x m ] represent the historical wind speed data and y is the corresponding forecasting value.The y is the output value obtained by a non-linearity autoregressive process expressing in the Equation (21).In addition, the simulated experiments are executed to illustrate the effectiveness of the hybrid activation function.
where h is usually named as the forecasting horizon between the history and the future value, f is a nonlinear function describing the relationship between the historical wind speed data and the present value.

The Proposed HBSA-DAWNN Approach
In the HBSA-DAWNN model, the parameter combination in DAWNN are tuned by RBSA method and the effective input variables are selected by BBSA technique.The process of feature selection and parameter optimization for HBSA-DAWNN are presented in the Algorithm 1.In the solution representation, feature masks are encoded to binary value 1 or 0. The statistical index root mean square error (RMSE), as expressed in Equation 23, is selected as the fitness function to quantitatively evaluate the HBSA-DAWNN model.

Algorithm 1
The pseudo code of HBSA-DAWNN

1:
Generate the initial parameters in BSA: iteration number (T), dimension number (D), population size (N) and F.

2:
Set the initial population according to Equation 20.

3:
Convert the population values into binary values according to Equations 14 and 15. 4: for i from 1 to N do 5: Make WSF by the proposed forecasting engine using each population.6: Calculate the fitness function according to Equation 23 for each population.7: endfor 8: Produce initial historical population according to Equation 8. 9: for i from 1 to T do 10: Update historical population (oldP i ) according to Equations 9 and 10. 11: Calculate the trial population mutant according to Equation11.12: Convert the population values into binary value according to Equations 14 and 15. 13: Determine the input variable matrix using the binary value.14: Calculate the final trial population (T ij ).15: for i from 1 to N do 16: Make wind speed forecasting by the proposed forecasting engine.17: Calculate the fitness function using Equation 23 for each population.Obtain the effective input variables matrix and the optimum parameters combination for forecasting.

Model Construction and Development
To evaluate the proposed forecasting strategy, multi-step WSF experiments using the historical wind speed are carried out.All the tests are executed in MATLAB 2014a environment in personal computer with i5, 3.5 GHz CPU and 8 GB RAM under Windows 8 Operating System.Forecasting results illustrate that the proposed EEMD/VMD-HBSA-DAWNN model yields the highest forecasting accuracy with minimum statistical error indices.

The Statistical Error Evaluation Indices
The forecasting performance of a new developed model for WSF are generally evaluated by multiple statistical error indices.Mean absolute error (MAE), RMSE, mean absolute scale error (MASE), and mean absolute percent error (MAPE), expressed as Equations ( 22)-( 25), are typical statistical error indices that evaluate quantitatively the new developed model and the compared models.The forecasting model performs better when the statistical indices are smaller.RMSE reveals the entire deviation between the measured data and the forecasting values, MAE reflects how similar the forecasting values are to the actual measured ones, whereas the unit-free statistical index MAPE is sensitive to the small changes in the forecasting results.Thus, these statistical indices are suited to evaluate the new developed EEMD/VMD-HBSA-DAWNN model.
where s(i) and ŝ(i) represent the actual samples and the forecasting value at time i, respectively.N stands for the entire quantity of the wind speed data.

Empirical Wind Speed Time Series
The historical wind speed data, measured each 15 min interval, are gathered and stored in the central computer of a wind farm in Anhui of China, to evaluate and test the proposed forecasting strategy.Two sets of 700 successive wind speed samples, as shown in Figure 4, are selected randomly from 2015 as empirical samples.In the empirical samples, the foregoing 600 wind speed data are utilized to train HBSA-DAWNN model while the subsequent 100 wind speed time series are employed to test the proposed model.The statistical descriptions of the empirical data are listed in the Table 1.From the Figure 4 and Table 1, there are high fluctuations within [1.09, 11.54] and [0.75, 12.57] in the wind speed data.
strategy.Two sets of 700 successive wind speed samples, as shown in Figure 4, are selected randomly from 2015 as empirical samples.In the empirical samples, the foregoing 600 wind speed data are utilized to train HBSA-DAWNN model while the subsequent 100 wind speed time series are employed to test the proposed model.The statistical descriptions of the empirical data are listed in the Table 1.From the Figure 4 and Table 1, there are high fluctuations within [1.09, 11.54] and [0.75, 12.57] in the wind speed data.

Wind Speed Decomposition Using Two-Stage Decomposition Technique
As proved by previous studies [20,23], further decomposition of IMF1 can improve the forecasting accuracy over horizons of not only one-step but also multi-step prediction while the decompositions of IMF2 and IMF3 make the forecasting accuracy deteriorate, because IMF1 owns the most unsystematic and the highest frequency part of wind speed, which may lower the regression performance of the forecasting engine.To eliminate this negative influence, one practical approach is to directly discard the IMF1 [12].However, Guo [36] pointed out that this simple treatment can slightly enhance the forecasting performance in some situations.Secondary decomposition method is the other effective approach to solve this problem [23].Considering EEMD and VMD have good potential to disassemble wind speed into relatively steady sub-series, we attempt to employ it to further deal with the highest frequency IMF1.
The K and τ in VMD are set 4 and 0, respectively, while the white noise amplitude and the ensemble number in EEMD are set 0.2 and 100.The decomposed results of the empirical samples are shown in Figure 5. From figures, it can be obtained that different IMFn and Res exhibit distinct properties.IMF1 has the highest frequency feature, which reflects random information of the empirical samples.The IMF7, IMF8, and Res reveal their trend characteristics, and the other IMFs reveal periodic characteristics of wind speed.

Construction of Input Feature Matrix for the Forecasting Engine
Prior to submitting the decomposed components to the propose DAWNN, the input variables matrix should be determined.After linear normalization within [0, 1], PACF technique is applied to determine the input variables combination.The PACF values from lags 0 to 30 are calculated, which are displayed in the Table 2. Seen from the Table 2, PACF value of original wind speed data set A is 9.The number 9 reflects the antecedent 9 continuous wind speed contribute mostly to the subsequent forecasting values.As seen from Figure 6, the 9 antecedent wind speed time series are taken as the inputs of the proposed DAWNN to predict the subsequent values.The input combination of the proposed DAWNN for the other time series can be determined in the similar way according to PACF values in the Table 2. , , , x x x  Prior to submitting the decomposed components to the propose DAWNN, the input variables matrix should be determined.After linear normalization within [0, 1], PACF technique is applied to determine the input variables combination.The PACF values from lags 0 to 30 are calculated, which are displayed in the Table 2. Seen from the Table 2, PACF value of original wind speed data set A is 9.The number 9 reflects the antecedent 9 continuous wind speed contribute mostly to the subsequent forecasting values.As seen from Figure 6, the 9 antecedent wind speed time series are taken as the inputs of the proposed DAWNN to predict the subsequent values.The input combination of the proposed DAWNN for the other time series can be determined in the similar way according to PACF values in the Table 2.

Construction of the HBSA-DAWNN
In the BSA-DAWNN model, the dimension of individuals in the population is set according to the sum of ωi,j, ωj,k, μ, aj, and bj.Seen from Figure 3   .For a i,j , the interval of [0.5, 2] is selected in that it does not make the wavelet function too spread or dense [13].All these parameters are initialized with uniform distribution.The parameters in the BSA-DAWNN model are listed in Appendix B.
The number of hidden neurons nodes can also influence the WNN performance.However, there is no clear definition or unified standard with respect to optimal selection of the number of hidden neurons for WNN in different applications.For the original samples A, the quantity of input neurons is set as nine according to Table 2 and the number of the output neuron is set as one.Some experiments are carried out to select the optimal number of the hidden nodes by HAS-WNN model.The statistical index RMSE is shown in Figure 7. From the figure, it can be obtained that the optimal number is 19, where the RMSE is lowest.The optimal hidden node numbers for other ones are selected in the similar manner.
Energies 2019, 12, x FOR PEER REVIEW 13 of 24 The number of hidden neurons nodes can also influence the WNN performance.However, there is no clear definition or unified standard with respect to optimal selection of the number of hidden neurons for WNN in different applications.For the original samples A, the quantity of input neurons is set as nine according to Table 2 and the number of the output neuron is set as one.Some experiments are carried out to select the optimal number of the hidden nodes by HAS-WNN model.The statistical index RMSE is shown in Figure 7. From the figure, it can be obtained that the optimal number is 19, where the RMSE is lowest.The optimal hidden node numbers for other ones are selected in the similar manner.
In the HBSA-DAWNN model, the BBSA is exploited to identify the effective candidates from the input matrix predefined by PACF.RBSA is developed to tune the parameters in the proposed DAWNN model.The feature selection results for different time series obtained by BBSA are listed in the Tables 3 and 4.   In the HBSA-DAWNN model, the BBSA is exploited to identify the effective candidates from the input matrix predefined by PACF.RBSA is developed to tune the parameters in the proposed DAWNN model.The feature selection results for different time series obtained by BBSA are listed in the Tables 3 and 4.
Note: 1 and 0 mean selection and discard, respectively.
Note: 1 and 0 mean selection and discard, respectively.

Numerical Results and Discussion
In this study, two categories of prediction models have been constructed and tested: the first category is to develop individual regression models without wind speed decomposition technique, which include Persistence, ARMA, BPNN, and WNN, HBSA-WNN; the second category is to construct signal decomposition-based WSF models, which includes EEMD/VMD-HSBA-DAWNN, EEMD-HSBA-WNN, EEMD/VMD-HSBA-WNN with Morlet function or Mexican hat function, EMD-NN [36] and EEMD-GA-BPNN [37].The parameters in the EMD-NN and EEMD-GA-BPNN are set according to the respective references.All the forecasting models are tested using the actual wind speed data.As that in a previous study [19], the Persistence model is also utilized as a benchmark approach to evaluate the proposed model.

Case 1
In this case, the empirical wind speed, as displayed in Figure 4a, are employed to construct and evaluate the models.The forecasting results and statistical error indices are presented in Figures 8  and 9, and Tables 5 and 6.As seen from the tables, the smallest error indices are marked in boldface, which means the model exhibits the best forecasting performance.Comparisons are carried out to testify the advantages of the signal decomposition methods, feature selection and parameter optimization.Tables 5 and 6 and Figures 8 and 9 illustrate the following:

•
Tables 5 and 6 illustrate that the proposed EEMD/VMD-HSBA-DAWNN model achieves best forecasting results with respect to RMSE with 0.2249 m/s, 0.2681 m/s, and 0.3084 m/s for one-, two-, and three-step, respectively.
• Persistence presents the highest forecasting statistical indices values with 0.8818 m/s, 0.9543 m/s, and 1.0433 m/s RMSE values for one-, two-, and three-step ahead prediction, respectively.

•
From Figure 8b,d,f, the accuracy of the individual models is ranked from low to high as Persistence, ARMA, BPNN, WNN and HBSA-WNN.

•
From the Figures 8 and 9, it is obviously seen that the RMSE values of the WNN-based forecasting strategy are ranked from big to small as WNN, HBSA-WNN, EEMD-HBSA-DAWNN, and EEMD/VMD-HBSA-DAWNN.

•
To illustrate the effectiveness of the double activation functions in WNN, the proposed EEMD/VMD-HBSA-DAWNN is compared with the models with Morlet function and Mexican hat function.Tables 5 and 6

•
Compared with the single statistical approaches ARMA and Persistence, the single AI models BPNN and WNN yield better forecasting performance in that the artificial intelligent models have better capacities in handling non-linear wind speed time series.

•
Signal decomposition-based forecasting models obtain remarkable improvements over the individual forecasting models without signal decomposition methods because there exists non-linearity, non-stable, and high fluctuation in the wind speed time series, the signal techniques break wind speed data into different relatively stationary subseries, thus reducing the regression difficulties of the forecasting engine and improving the prediction accuracy.

•
Comparisons between the HSBA-WNN model with Morlet function and the individual WNN model with Morlet function using the same wind speed data are made to examine the capacity of HBSA in feature selection and parameter optimization.From Table 5 and Figure 8, HBSA-WNN performs better than WNN for multi-step prediction.The reasons of these results are that feature selection technique by BBSA method eliminates the illusive components and identify the effective components, while RBSA algorithm is utilized as parameter optimization to tune the parameters in WNN, thus enhancing the forecasting performance.

•
The HBSA-DAWNN model with two-stage decomposition method EEMD/VMD has higher forecasting accuracy than the HBSA-DAWNN model with EEMD, whose underlying reasons are that the two-stage decomposition can effectively deal with the problems of the irregularity of IMF1 through further decomposition.Thus, the two-stage decomposition approach EEMD/VMD is an efficient data-preprocessing method in improving WSF performance.Therefore, the integration of WNN method with feature selection and parameters optimization can provide higher accuracy when applied in WSF.Moreover, the application of two-stage decomposition approach in the wind speed preprocessing is verified by comparing with other models.  .Model4 stands for EEMD-GA-BPNN, 5 .Model5 stands for EMD-NN, 6 .Model6 stands for the proposed model.

Case 2
To further manifest the effectiveness of EEMD/VMD-HBSA-DAWNN, Case 2 from the same wind farm is employed to construct the proposed forecasting strategy and other prediction models for multi-step WSF.The empirical wind speed data are shown in the second subgraph of Figure 4.In the same way, the empirical wind speed data are divided into two sets, namely, the first 600 samples are used to train the models and the subsequent 100 ones are employed to test the models.Tables 7  and 8 list th forecasting results, and the corresponding comparisons are carried out.According to the prediction indices in the tables, the same conclusions as that in Case 1 can be obtained.  .Model4 stands for EEMD-GA-BPNN, 5 .Model5 stands for EMD-NN, 6 .Model6 stands for the proposed model.

Case 3
The total historical wind speed data of 2015 are displayed in the Figure 10.The red wind speed in the Figure 10a are the selected empirical samples that are used to train and test the proposed model in the previous sections of the paper.To better manifest the effectiveness of EEMD/VMD-HBSA-DAWNN, all the total wind speed data of 2015 were divided into two sets, as shown in Figure 10b,c, to train and test the hybrid models.The first 1st ~14640th points wind speed time series are utilized to train the forecasting models to predict the subsequent 2880 wind speed time series.The forecasting curves and their errors of wind speed data A are displayed in the Figure 11 and the statistical indices are listed in Table 9.It can be seen from the Figure 11b,d,f that the forecasting accuracy decreases with the forecasting time.From the Table 9, RMSEs of the proposed model are 0.6516 m/s, 0.7234 m/s and 0.8026 m/s for one-step, two-step and three-step forecasting, respectively, which are smallest compared with EEMD-GA-BPNN and EMD-NN forecasting model.Therefore, the proposed model also outperforms EEMD-GA-BPNN and EMD-NN forecasting models in terms of the statistical indices.From the Table 10, RMSEs of the proposed model are 0.4456 m/s, 0.5018 m/s and 0.5427 m/s for one-step, two-step and three-step forecasting, respectively, which lower than that of EEMD-GA-BPNN and EMD-NN model.Thus, the same conclusions can be drawn for the wind speed data B.  2. Model2 stands for EEMD/VMD-HBSA-WNN with Morlet function, 3. Model3 stands for EEMD/VMD-HBSA-WNN with Mexican hat function, 4. Model4 stands for EEMD-GA-BPNN, 5. Model5 stands for EMD-NN, 6. Model6 stands for the proposed model.

Case 3
The total historical wind speed data of 2015 are displayed in the Figure 10.The red wind speed in the Figure 10(a) are the selected empirical samples that are used to train and test the proposed model in the previous sections of the paper.To better manifest the effectiveness of EEMD/VMD-HBSA-DAWNN, all the total wind speed data of 2015 were divided into two sets, as shown in Figure10(b) and (c), to train and test the hybrid models.The first 1st ~ 14640th points wind speed time series are utilized to train the forecasting models to predict the subsequent 2880 wind speed time series.The forecasting curves and their errors of wind speed data A are displayed in the Figure 11 and the statistical indices are listed in Tables 9.It can be seen from the Figure 11 (b), (d) and (f) that the forecasting accuracy decreases with the forecasting time.From the Table 9, RMSEs of the proposed model are 0.6516 m/s, 0.7234 m/s and 0.8026 m/s for one-step, two-step and three-step forecasting, respectively, which are smallest compared with EEMD-GA-BPNN and EMD-NN forecasting model.Therefore, the proposed model also outperforms EEMD-GA-BPNN and EMD-NN forecasting models in terms of the statistical indices.From the Table 10, RMSEs of the proposed model are 0.4456 m/s, 0.5018 m/s and 0.5427 m/s for one-step, two-step and three-step forecasting, respectively, which lower than that of EEMD-GA-BPNN and EMD-NN model.Thus, the same conclusions can be drawn for the wind speed data B.

Figure 2 .
Figure 2. Architecture of general wavelet neural network.(i, j and k stand for the number of the input nodes, hidden nodes and output nodes).

Figure 2 .
Figure 2. Architecture of general wavelet neural network.(i, j and k stand for the number of the input nodes, hidden nodes and output nodes).

18 : endfor 19 :
Determine the minimal objective function, and find the optimum parameters combination and the effective input variables.20: endfor 21:

Figure 4 .
Figure 4. Empirical original wind speed data.Figure 4. Empirical original wind speed data.

Figure 4 .
Figure 4. Empirical original wind speed data.Figure 4. Empirical original wind speed data.
Figure 5b,d illustrate the decomposed results of IMF1 by the proposed two-stage decomposition technique.The lowest frequency Mode1 exhibits general tendency of IMF1, the highest frequency Mode4 has the smallest contribution to IMF1.After decomposition of wind speed by two-stage decomposition technique, the WSF can be converted into the prediction of each IMF, mode, and Res.

Figure 5b ,Figure 5 .
Figure 5b,d illustrate the decomposed results of IMF1 by the proposed two-stage decomposition technique.The lowest frequency Mode1 exhibits general tendency of IMF1, the highest frequency Mode4 has the smallest contribution to IMF1.After decomposition of wind speed by two-stage decomposition technique, the WSF can be converted into the prediction of each IMF, mode, and Res.

 x 10 x 11 x 12 x 11 x 12 x 13 xFigure 6 .
Figure 6.Wind speed forecasting format of input variables and output variables for original data A.

Figure 5 .
Figure 5. Wind speed decomposed results by two-stage decomposition technique 4.4.Construction of Input Feature Matrix for the Forecasting Engine , the outputs of Mexican hat and Morlet wavelet functions are approximately zero when the input values are not in the interval [−3, 3], therefore, the values of bi,j are initialized within [−3, 3].The weighted coefficients ωi,j and ωj,k of neural network are generally initialized in the range [−1, 1].For ai,j, the interval of [0.5, 2] is selected in that it does not make the wavelet function too spread or dense[13].All these parameters are initialized with uniform distribution.The parameters in the BSA-DAWNN model are listed in Appendix B.

Figure 6 .
Figure 6.Wind speed forecasting format of input variables and output variables for original data A.

4. 5 .
Construction of the HBSA-DAWNN In the BSA-DAWNN model, the dimension of individuals in the population is set according to the sum of ω i,j , ω j,k , µ, a j , and b j .Seen from Figure 3, the outputs of Mexican hat and Morlet wavelet functions are approximately zero when the input values are not in the interval [−3, 3], therefore, the values of b i,j are initialized within [−3, 3].The weighted coefficients ω i,j and ω j,k of neural network are generally initialized in the range [−1, 1]

Figure 7 .
Figure 7.The RMSE for different numbers of hidden neurons for original wind data A.

Figure 7 .
Figure 7.The RMSE for different numbers of hidden neurons for original wind data A.
illustrate the forecasting results.The RMSE values of the proposed EEMD/VMD-HBSA-DAWNN are the smallest.Thus, the proposed strategy outperforms the model with Mexican hat wavelet function or Morlet function.Remark: The proposed EEMD/VMD-HSBA-DAWNN outperforms all the other mentioned individual models and hybrid forecasting models when applied in multi-step WSF.The proposed model owns a lower RMSE value 0.2249 m/s whereas RMSEs of 0.7282 m/s, 0.6868 m/s, 0.2631 m/s, 0.2737 m/s, and 0.2599 m/s for the WNN, HBSA-WNN, EEMD/VMD-HBSA-WNN (Morlet), EEMD/VMD-HBSA-WNN (Mexican) and EEMD-HBSA-DAWNN, respectively, in one-step.Likewise, the proposed model produces lower statistical error indices (RMSE, MAE, MAPE, and MASE) in two-and three-step forecasting.The signal decomposition-based hybrid prediction models have better forecasting performance when compared with pure individual forecasting models without signal process.

Figure 8 .
Figure 8. Forecasting results by models without signal decomposition for data A

Figure 9 .
Figure 9. Forecasting results by models with signal decomposition for data A (Mode1~Model6 are defined as those inTable 6)

Figure 10 .
Figure 10.Half year and whole year wind speed data of 2015.

Figure 10 .
Figure 10.Half year and whole year wind speed data of 2015.

Figure 11 .
Figure 11.Wind speed forecasting results for data set A.

Figure 11 .
Figure 11.Wind speed forecasting results for data set A.

3 :
As illustrated in Algorithm 1, select the effective candidates through feature selection by BBSA technique and optimize the parameters combination of DAWNN model by RBSA algorithm using the decomposed wind speed data, which are realized by HBSA simultaneously.

Table 1 .
Statistical properties of the total empirical wind speed time series (m/s).

Table 2 .
Lag values obtained by PACF for original wind speed and the different decomposed sub-series.

Table 2 .
Lag values obtained by PACF for original wind speed and the different decomposed sub-series.

Figure 6 .
Wind speed forecasting format of input variables and output variables for original data A.

Table 3 .
Feature selection obtained by HBSA for data set A and its decomposed components.

Table 3 .
Feature selection obtained by HBSA for data set A and its decomposed components.

Table 4 .
Feature selection obtained by HBSA for data set B and its decomposed components.
• WNN based on Morlet and Mexican hat activation functions with weighted coefficient outperforms that based on Morlet or Mexican hat activation function, because the proposed Energies 2019, 12, 334 forecasting engine takes advantages of the combination of the individual mother wavelet functions of Morlet function and Mexican hat function by weighted coefficient.
Compared with the single statistical approaches ARMA and Persistence, the single AI models BPNN and WNN yield better forecasting performance in that the artificial intelligent models have better capacities in handling non-linear wind speed time series.Signaldecomposition-based forecasting models obtain remarkable improvements over the individual forecasting models without signal decomposition methods because there exists non-linearity, non-stable, and high fluctuation in the wind speed time series, the signal techniques break wind speed data into different relatively stationary subseries, thus reducing the regression difficulties of the forecasting engine and improving the prediction accuracy.

Table 5 .
Statistical indices obtained by forecasting models without signal decomposition for data set A. Forecasting results by models without signal decomposition for data A.

Table 6 )Table 6 .
Statistical indices obtained by forecasting models with signal decomposition for data set A.

Table 5 .
Statistical indices obtained by forecasting models without signal decomposition for data set A.

Table 6 .
Statistical indices obtained by forecasting models with signal decomposition for data set A.

Table 7 .
Statistical indices obtained by forecasting models without signal decomposition for data set B.

Table 8 .
Statistical indices obtained by forecasting models with signal decomposition for data set B.