Optimization of Feedforward Neural Networks Using an Improved Flower Pollination Algorithm for Short-Term Wind Speed Prediction

: It is well known that the inherent instability of wind speed may jeopardize the safety and operation of wind power generation, consequently a ﬀ ecting the power dispatch e ﬃ ciency in power systems. Therefore, accurate short-term wind speed prediction can provide valuable information to solve the wind power grid connection problem. For this reason, the optimization of feedforward (FF) neural networks using an improved ﬂower pollination algorithm is proposed. First of all, the empirical mode decomposition method is devoted to decompose the wind speed sequence into components of di ﬀ erent frequencies for decreasing the volatility of the wind speed sequence. Secondly, a back propagation neural network is integrated with the improved ﬂower pollination algorithm to predict the changing trend of each decomposed component. Finally, the predicted values of each component can get into an overlay combination process and achieve the purpose of accurate prediction of wind speed. Compared with major existing neural network models, the performance tests conﬁrm that the average absolute error using the proposed algorithm can be reduced up to 3.67%.


Introduction
Natural energy sources like oil, coal and gas used for power generation are usually environmentally destructive in their harvesting to produce usable energy. They are gone forever when they are used up. Therefore, natural and renewable energy resources from the earth are highly desired. In recent years, wind power has become a new sustainable and renewable alternative to burning fossil fuels, which has a much smaller impact on the environment. With the rapid increase in large-scale wind farms, wind power generation continues to rise in the power industry [1]. For example, renewable energy has generated 72,896 kW power during 2018 in China, with wind power accounting for 25.3% of all this renewable energy capacity [2]. However, the stochasticity and intermittent nature of wind speed may result in an obvious disturbance to the power system and thus affect the power quality. Accordingly, the security and stability of power systems are facing more challenges than before [3]. If the prediction of wind speed can be more accurate, distribution and reserve capacity can be more effectively rationalized through wind turbines. Accordingly, the electricity power spare capacity can be reduced considerably so that smart grids can be achieved [4][5][6][7].
Physical or statistical methods were usually used for wind speed prediction [8][9][10]. In the physical method, for instance, based on the numerical weather prediction model (NWP), a Kalman filter is

The Fundamentals of the FF Model
The feedforward (FF) neural network used in this paper is based on a multi-layer feedforward network, which is composed of three layers [22,23]. It can efficiently train the network using a gradient-based optimization algorithm. The network weights and thresholds that connect the input layer and output layer keep updatingiteratively to reduce the error until it reaches the desired task for which it is being trained. The procedures to implement the FF model are shown as follows: (1) Initialize all connection weight v and threshold b using small random numbers.
(2) Determine the structure of the FF model, input training samples and predictive output samples.
(3) Calculate the outputs of the hidden layer under Equation (1) and the output layer under Equation (2) based on the forward propagation process.
, k = 1, 2, . . . S 2 (2) where θ 1 j is the output of the hidden layer; f is the transfer function of the hidden layer; v 1 ji is the connection weight; x i is the node of the input layer; b 1 j is the threshold of the hidden layer; S 1 is the number of hidden layers; θ 2 k is the output layer; g is the transfer function of the output layer; v 2 k j is the connection weight; x j is the node of the hidden layer; b 2 k is the threshold of the output layer; and S 2 is the number of output layers. (4) Update weights and thresholds using the back-propagation process, as shown in Equation (3): where n(i + 1) and n(i) are the weights and thresholds at the i + 1 and i iteration, respectively; β is a learning rate, and P(i) is the negative gradient of the output error at the i iteration, that is, the fastest direction in gradient. (5) Repeat Steps (3) and (4) until the mean square error (E) reaches the predefined value: where S 2 is the number of output layers; t k is the target value; and θ 2 k is the output value.
As above, FF may suffer from a weak global search ability, i.e., liable to be trapped into a local minimum. To solve this problem, some algorithms such as extreme learning machines, support vector machines, particle swarms and genetic algorithms are usually used to improve FF [24,25]. In this study, the improved flower pollination algorithm (IFPA) is developed to optimize the FF neural network.

The Principle of IFPA
The IFPA is developed from the base of the flower pollination algorithm (FPA). Usually, the FPA is performed under the following assumptions: (1) Biological allogamy is a pollinator flying through Levy to achieve a global pollination effect; (2) Self-pollination hypothesis is a local pollination procedure; (3) The reproduction rate of flowers is proportional to the rate of pollination; and (4) The transition between global pollination and local pollination is determined by physical factors, such as distance between flowers or wind speed. This conversion is represented in the algorithm by the conversion probability [26]. The FPA main steps are described below: (1) Initializing the parameters, such as the number of flower population N, the transformation probability parameters P and a random number rand ∈ (0,1) in FPA. (2) Finding the global optimal solution by comparing the values of the fitness of each solution.
(3) If the conversion rate is p > rand (a predefined value), the solution is updated, and the transboundary process is implemented in the light of Equation (5): where u t+1 i and u t i represent the solutions of the (t + 1) th and (t) th generation, separately; g * is the global optimum solution; L is the step length produced by pollen propagator Levyflying; and L obeys the Levy distribution, and its value is substituted into Equation (6): where s >> s 0 > 0, s is the step dimension, and s 0 is the minimum step dimension; Γ(λ) is the standard Gamma function, and select λ = 1.5; and the Leavy flight step L is generated under Mantegna's algorithm [26]. (4) If the conversion rate is p ≤ rand, the solution is updated, and the transboundary process is implemented according to Equation (7): where γ is a random number on [0,1]; and u t j and u t q are pollen of flowers of the same plant species. The principle of FPA indicates that the optimization solution mainly depends on the interaction between pollen individuals, which can be easily trapped into a local minimum. To resolve this problem, the inertia weight ω is introduced into the pollen position. The new position is updated using Equation (8): At the beginning of the iteration, the larger weight of the previous generation pollen in FPA has a greater impact on the current pollen movement. The attraction between pollen is also smaller. Contrastively, the smaller inertia weight in the later iteration can make the attraction between pollen larger. In this situation, the local search power becomes stronger, while the global search power weakens. To avoid a repeated oscillation near the extreme point from a larger moving step size [26][27][28][29], four inertia weights, i.e., ω 1 , ω 2 , ω 3 and ω 4 , are selected to improve FPA, where ω 1 is a fixed inertia weight, ω 2 is a linear inertia weight, ω 3 is an exponential decreasing inertia weight and ω 4 is a dynamic inertia weight. Each inertia weight is shown as follows: where ω max is the biggest inertia weight; ω min is the minimum inertia weight; i is the current value of iterations; and i max is the maximum value of iterations. When the inertial weight varies from 0.9 to 0.4, the optimal searching effect can achieve the best solution [30]. For this reason, select ω max = 0.9, ω min = 0.4 and i max = 200. The variation of four inertial weights vs. iteration number is shown in Figure 1.

Convergence of IFPA
In the study, there are four inertia weights, i.e., ω 1 , ω 2 , ω 3 and ω 4 , added to IFPA. Rastrigin and Griewank functions are chosen for testing the global optimization performance. The population number, iteration number and conversion probability were set as 20, 200 and 0.8, respectively [31,32]. The parameter settings environment is shown in Table 1. As shown in Table 2, the performance results where max ω is the biggest inertia weight; min ω is the minimum inertia weight; i is the current value of iterations; and imax is the maximum value of iterations. When the inertial weight varies from 0.9 to 0.4, the optimal searching effect can achieve the best solution [30]. For this reason, select , and imax=200. The variation of four inertial weights vs. iteration number is shown in Figure 1.
Rastrigin and Griewank functions are chosen for testing the global optimization performance. The population number, iteration number and conversion probability were set as 20, 200 and 0.8, respectively [31,32]. The parameter settings environment is shown in Table 1. As shown in Table 2, the performance results are based on the average of 20 independent tests. It indicates that the optimal solution can be achieved by choosing 4 ω in either a Rastrigin or Griewank function.
In Figure 2, the Rastrigin function is taken as the objective function. When the inertia weight is 4 ω , the optimal value is obtained in the 162 th generation. In Figure 3, the Griewank function is taken as the objective function. When the inertia weight is 4 ω , the optimal value is obtained in the 101 th generation. Moreover, it proves that 4 ω can reach the fastest convergence speed among all parameters. ω Figure 1. Variation of four inertial weights vs. iteration number. In Figure 2, the Rastrigin function is taken as the objective function. When the inertia weight is ω 4 , the optimal value is obtained in the 162 th generation. In Figure 3, the Griewank function is taken as the objective function. When the inertia weight is ω 4 , the optimal value is obtained in the 101 th generation. Moreover, it proves that ω 4 can reach the fastest convergence speed among all parameters.

Optimization of the FF Model Based on IFPA
In this study, the IFPA is integrated with FF neural network to improve the optimization of FF weights and thresholds in the network. The Levy flight characteristics in IFPA are fully utilized for the global searching. Accordingly, the initial weight matrix and bias vector can be found and then given to FF neural network for the training procedure. Figure 4 shows the flow of IFPA-FF, and it is illustrated as follows: (1) Initialize FF. Set input layer, hidden layer and output layer, initial network weight v and initial thresholds b. (4) The speed and position of all pollens are initialized, and the objective function of each pollen is kept computing until the minimum fitness value is reached. (5) A random value of rand is generated and compared with the conversion probability p. According to Equations (5) and (7) in FPA, the solution is kept being updated until the global optimal solution is reached. (6) If the termination condition of IFPA is met, proceed to the next step. Otherwise, return to Step (5). (7) The optimal pollen individual ui is decoded, and the decoded weights vi and bi are used as connection weight v and threshold b for FF. (8) If the ending condition of FF training is satisfied, the optimal network structure is thus obtained for further prediction. Otherwise, go back to Step (7).    Table 4 are based on the average value from 20 independent tests. The iterative convergence curves using Schaffer and Ackley functions are shown in Figures 5 and 6, respectively. It can be seen that the IFPA model presents the best convergence performance in both of functions.

Preprocessing of Wind Speed Data
The EEMD is an improved EMD model [33]. Based on EMD, a set of Gaussian white noise with normal distribution is added, which effectively improves the aliasing and discontinuity of signals at different scales and avoids the mode aliasing caused by EMD decomposition process [34][35][36]. The flowchart of EEMD is shown in Figure 7, illustrated as follows: (1) A normal white Gaussian noise sequence ) (t v was added to the original sequence ( ) g t to generate a new sequence ( ( ) ( ) ( )) u t v t g t = + . (2

Preprocessing of Wind Speed Data
The EEMD is an improved EMD model [33]. Based on EMD, a set of Gaussian white noise with normal distribution is added, which effectively improves the aliasing and discontinuity of signals at different scales and avoids the mode aliasing caused by EMD decomposition process [34][35][36]. The flowchart of EEMD is shown in Figure 7, illustrated as follows: (1) A normal white Gaussian noise sequence v(t) was added to the original sequence g(t) to generate a new sequence (u(t) = v(t) + g(t)). (2) Find the maximum and minimum values in x(t) and fit the upper envelope I max (t) and the lower envelope I min (t) of u(t).
(3) Calculate the average envelope m(t) = I min (t)+I max (t) 2 , and subtract it from the noise sequence u(t) to form a new sequence p(t) (= u(t) − m(t)). (4) Justify if p(t) satisfies the IMF. If yes, the IMF, denoted as h 1 (t), is confirmed. Otherwise, repeat Steps (1) and (2) until it is satisfied. (5) The h 1 (t) is subtracted from the original sequence u(t), and the residual component Repeat the same process from the first step until r k (t), as shown in Equation (13), becomes a monotone function. (7) Finally, u(t) is decomposed into k IMFs with different oscillation modes, as shown in Equation (14).
. . . The data used in this study comes from the Sotavento wind farm in Galicia, Spain, and it is taken from 1 March to 13 March 2018. The wind speed statistics data is shown in Table 5. Sampling interval of wind speed data was 10 min, as shown in Figure 8.  The data used in this study comes from the Sotavento wind farm in Galicia, Spain, and it is taken from 1 March to 13 March 2018. The wind speed statistics data is shown in Table 5. Sampling interval of wind speed data was 10 min, as shown in Figure 8.  From Table 5, it is seen that the real wind speed data is highly volatile and unpredictable. Based on the EEMD, it can be decomposed into several relatively stable series under different characteristic scales. The decomposition results are shown in Figure 9. As shown in Figure 9, the frequency of each component is, in turn, from high to low, and the large fluctuation of high frequency component is the random influence part of wind speed. The low  Table 5, it is seen that the real wind speed data is highly volatile and unpredictable. Based on the EEMD, it can be decomposed into several relatively stable series under different characteristic scales. The decomposition results are shown in Figure 9. From Table 5, it is seen that the real wind speed data is highly volatile and unpredictable. Based on the EEMD, it can be decomposed into several relatively stable series under different characteristic scales. The decomposition results are shown in Figure 9. As shown in Figure 9, the frequency of each component is, in turn, from high to low, and the large fluctuation of high frequency component is the random influence part of wind speed. The low As shown in Figure 9, the frequency of each component is, in turn, from high to low, and the large fluctuation of high frequency component is the random influence part of wind speed. The low frequency component has the characteristics of a sinusoidal wave and is generally considered as the periodic component of wind speed. The symbol "RES" represents the trend component, indicating the long-term trend of wind speed. There are 10 decomposed sequences (IMF1, IMF2, . . . , IMF9, RES) input into the IFPA-FF model for training. After the training process is completed, these 10 predicted components are superimposed to obtain the final prediction value for comparison with the real value. The flowchart of the prediction model is shown in Figure 10. input into the IFPA-FF model for training. After the training process is completed, these 10 predicted components are superimposed to obtain the final prediction value for comparison with the real value. The flowchart of the prediction model is shown in Figure 10. Figure 10. Flowchart of the prediction model.

Wind Speed Forecasting Employed EEMD-IFPA-FF Model
There are three error functions, i.e., MAE, RMSE and MAPE applied to evaluate the effectiveness of the proposed model. Besides, an indicator R-Squared is employed for calculating the fitness of the model.    Figure 10. Flowchart of the prediction model.

Wind Speed Forecasting Employed EEMD-IFPA-FF Model
There are three error functions, i.e., MAE, RMSE and MAPE applied to evaluate the effectiveness of the proposed model. Besides, an indicator R -Squared is employed for calculating the fitness of the model.
where n is the forecast sample number; x i is the value speed at time i; x i is the forecasted value at time i; and x i is the average of the true value at time i. The initial parameters in this experiment were chosen from References [32,35]. Note that 10 tests were carried out to determine the optimal parameter values in FF and FPA models, concluded in Table 6. The neural network has 15 inputs, 10 neurons in the first hidden layer, 5 neurons in the second hidden layer and one output.  13 March, a total of 1872 samples were used for model implementation. After EEMD decomposition, 1772 samples of each IMF and RES were trained, and the last 100 samples were used to judge the prediction efficiency. Three models, i.e., EEMD-IFPA-FF, EEMD-FPA-FF and IFPA-FF, were evaluated and compared using the same wind speed data. The comparison between the predicted wind speeds and actual ones on 13 March is shown in Figure 11. It reveals that the predicted curve from EEMD-IFPA-FF model is relatively closer to the actual one than others. and i x is the average of the true value at time i .
The initial parameters in this experiment were chosen from References [32,35]. Note that 10 tests were carried out to determine the optimal parameter values in FF and FPA models, concluded in Table 6. The neural network has 15 inputs, 10 neurons in the first hidden layer, 5 neurons in the second hidden layer and one output.  13 March, a total of 1872 samples were used for model implementation. After EEMD decomposition, 1772 samples of each IMF and RES were trained, and the last 100 samples were used to judge the prediction efficiency. Three models, i.e., EEMD-IFPA-FF, EEMD-FPA-FF and IFPA-FF, were evaluated and compared using the same wind speed data. The comparison between the predicted wind speeds and actual ones on 13 March is shown in Figure 11. It reveals that the predicted curve from EEMD-IFPA-FF model is relatively closer to the actual one than others. Similarly, the performance test outcomes carried out on 13 February and 13 April can be seen in Figures 12 and 13, respectively. By comparing Figures 11-13, the wind speed data in February is relatively stable with less fluctuation and lower amplitude. As a result, each model was more accurate in February than in other months. This implies that the fluctuation of the wind speed and the magnitude is the major factor that may affect the prediction accuracy. The comparison of training samples with existing models is concluded in Table 7. The comparison of testing samples with existing models is concluded in Table 8. Similarly, the performance test outcomes carried out on 13 February and 13 April can be seen in Figures 12 and 13, respectively. By comparing Figures 11-13, the wind speed data in February is relatively stable with less fluctuation and lower amplitude. As a result, each model was more accurate in February than in other months. This implies that the fluctuation of the wind speed and the magnitude is the major factor that may affect the prediction accuracy. The comparison of training samples with existing models is concluded in Table 7. The comparison of testing samples with existing models is concluded in Table 8.     Comparing the error results of Tables 7 and 8, the error of the test set slightly increases compared to the error of the training set. In both training and the test results analysis, the error from the IFPA-FF model performance can reach the smallest value, presenting a certain generalization ability. As shown in Figures 11-13, it is found that the wind speed trend prediction using IFPA-FF model has a lag behind the actual speed due to its randomness characteristics. On the other hand, the wind speed series predicted by EEMD-IFPA-FF model and EEMD-FPA-FF model are almost the same as the actual ones without any lag. In Table 8, the maxima errors appeared on 4.13 among three-days collected data. At this day, the MAE, RMSE and MAPE for IFPA-FF model is up to 0.75 m·s −1 , 0.89 m·s −1 , and 14.67%, respectively. In the EEMD-FPA-FF model, however, the MAE, RMSE and MAPE are down to below 0.51 m·s −1 , 0.68 m·s −1 and 10.74%, respectively. In the EEMD-IFPA-FF model, the MAE, RMSE and MAPE are further improved to 0.37 m·s −1 , 0.43 m·s −1 and 8.33%, respectively. Compared with the two other existing models, the MAE, RMSE and MAPE are reduced by more than 0.21 m·s −1 , 0.30 m·s −1 and 3.67%, respectively. By investigating R -Squared , it is confirmed that the EEMD-IPFA-FF reaches the highest fitness and achieves a better prediction. In conclusion, it is obvious that the prediction accuracy is considerably improved and relatively stable using the EEMD-IFPA-FF model.

Conclusions
In this paper, the proposed EEMD-IFPA-FF prediction model has been well established for an optimization achievement. The FPA is enhanced by incorporating dynamic inertia weights in the IFPA algorithm. Therefore, a faster convergence speed can be achieved. Besides, the FF global search capability is improved to reach a global solution by integrating the IFPA. Moreover, the wind speed data was decomposed using the EEMD method to increase the prediction accuracy. Accordingly, it is obvious that the proposed model provides a better performance than other existing models in both convergence speed and prediction error no matter how small or large the wind speed fluctuates. Apart from the above, the R -Squared value further verifies the EEMD-IFPA-FF model presents the highest fitness value among existing models.