Novel Parallel Heterogeneous Meta-Heuristic and Its Communication Strategies for the Prediction of Wind Power

: Wind and other renewable energy protects the ecological environment and improves economic efﬁciency. However, it is difﬁcult to accurately predict wind power because of the randomness and volatility of wind. This paper proposes a new parallel heterogeneous model to predict the wind power. Parallel meta-heuristic saves computation time and improves solution quality. Four communication strategies, which include ranking, combination, dynamic change and hybrid, are introduced to balance exploration and exploitation. The dynamic change strategy is to dynamically increase or decrease the members of subgroup to keep the diversity of the population. The benchmark functions show that the algorithms have excellent performance in exploration and exploitation. In the end, they are applied to successfully realize the prediction for wind power by training the parameters of the neural network.


Introduction
The world is facing the problem of resource shortages, and the utilization of renewable energy has become a hot research issue. Wind is a renewable clean energy with good economic condition and is rapidly developing. However, the large-scale application of wind power is limited due to volatility, intermittence and uncertainty. Therefore, the accurate prediction of wind power is considerable for the combination of power systems, optimizing energy market and reducing the cost of power reserve. Over the past decades, many models have been proposed to predict the wind power, which mainly include physical, statistical and intellectual learning methods [1][2][3][4][5][6].
The physical method is to establish the numerical weather prediction (NWP) model in the wind field and achieves the prediction by the parameters of wind turbines. The statistical method predicts wind power through constructing related mathematical functions. The methods of traditional statistics include time series models, regression analysis and so on, and they have complicated and poor prediction. In recent years, neural network (NN) has become popular because it can deal with the non-linear ability of data, and many models based on NN have been proposed to predict wind power. Since the speed and accuracy of NN are greatly affected by the related parameters, meta-heuristics are introduced to optimize the parameters of NN and implement the prediction [7].
However, meta-heuristics generally have the problems of slow convergence and poor solution quality, and many researchers have taken many useful attempts in parallelization [36][37][38][39]. Schutte et al. [40] proposes the coarse-grained parallelization of PSO to solve the problems of large-scale data, low cost and multiple local optimal solutions. Penas et al. [26] improves the abilities of global search and local search through asynchronous parallel and cooperative island-model to achieve an appropriate balance. Pan et al. [41] adopts a parallel and compact method based on bat algorithm (BA) to increase the diversity of solution in searching space and achieves the sharing of computation. Alba et al. [42] summarizes the development and application scenarios of parallel meta-heuristic algorithms in recent years, as well as it introduces the future development trend and possible research routes.
The main contributions of this paper are summarized as follows: • It first proposes parallel heterogeneous model based on PSO and GWO. • It introduces four new communication strategies to improve the abilities of exploration and exploitation. • It dynamically changes the members of subgroup from the diversity of the population.
The rest of the paper is organized as follows. Section 2 describes the algorithms of PSO and GWO, and population-based parallelization. Section 3 introduces a new parallel heterogeneous model and four communication strategies. Section 4 testifies their performance by 28 benchmark functions. Section 5 realizes the prediction for wind power by the algorithms and neural network. Section 6 concludes the works of this paper and gives some advice regarding further work.

Preliminaries
Meta-heuristic algorithm is a combination of stochastic and local search. It gives a feasible solution to the problem under acceptable computational time and space, and the solution is not be predicted in advance. Meta-heuristics are divided into trajectory-based algorithm and population-based algorithm [43]. In this section, we firstly introduce the population-based algorithms, PSO and GWO. Secondly, we briefly describe the communication models and strategies for parallel.

Particle Swarm Optimization
PSO simulates a flock of birds through mass-less particles. Each particle has only two properties of speed and position, where speed is the direction of movement and position denotes the motion of particle. Each particle separately searches for the optimal solution in the space and takes the best individual extreme ever found as the current global optimal solution of the whole particles. They adjust their speeds and positions according to the extreme and the global optimal solution [13]. PSO has the characteristics of simplicity and fewer parameters. It has been widely used in functional optimization, training neural network, fuzzy system controlling and other applications.
PSO randomly initializes a group of particles and then iteratively finds the optimal solution. In each iteration, the particles update their speeds and positions with the following equations.
where ij represents the j th dimension of the i th particle. t is the current iteration. r 1 and r 2 are two random numbers between [0, 1]. X denotes the position. V is the speed. pbest represents the extreme of an individual. gbest is the global optimal solution. c 1 and c 2 are coefficient. w is called the inertia factor, which is calculated as follows: where w max and w min respectively represent the maximum and minimum values of w. MAX_IT is the maximum number of iteration. it is the current number of iteration. Figure 1 shows the complete flow chart of PSO.  (1), (2) and (3) Randomly initialize the positions of the whole population X i (i = 1, 2, ..., n) Randomly initialize the speeds of the whole population V i (i = 1, 2, ..., n) Output gbest Calculate the fitness (f(x), x∈X i ) of each particle Update pbest Update gbest

End
No Yes Figure 1. The complete flow chart of particle swarm optimization (PSO).

Grey Wolf Optimizer
The grey wolf pack has a very strict social hierarchy similar to the pyramid. GWO mimics the behaviors of grey wolf, such as social hierarchy, searching and hunting prey [27]. GWO refers to the first three optimal solutions respectively named alpha (α), beta (β) and delta (δ). The remaining candidates are collectively referred to as omega (ω) wolves, and the omegas update their places by the positions of the three optimal solutions. GWO has the characteristics of strong convergence, few parameters and easy realization. A wolf first computes its distance from α, β and δ by Equations (4)-(9), then its position is updated through Equation (10). Figure 2 shows the complete flow chart of GWO.
where X α , X β and X δ respectively represent the positions of α, β and δ. D α , D β and D δ are the distances between α, β, δ and i. With the iteration process, a decreases linearly from 2 to 0. r 1 and r 2 are two random numbers between [0, 1].

Population-Based Parallelization
Parallel algorithms are generally superior to their corresponding non-parallel in efficiency, scalability, or solution quality. Most of them are homogeneous, so their parallelization is to implement multiple serial versions of them at the same time. Parallel algorithms not only effectively reduce the computational cost, but also further improve the solution quality. Because meta-heuristics often fall into local optimum, parallel algorithms could search for solution in more space. Therefore, they are increasingly used to solve complex global optimization problems.

Communication Models
Population parallelization is an important strategy, and the population is divided into several independent subgroups. Lalwani et al. [44] proposes four communication models based on [45].

Star model
One subgroup acts as the master node, while the others as slave nodes accept the control of it. Instead of directly communicating among the subgroups, the global information is exchanged through the master node, as shown in Figure 3a. •

Migration model
Each subgroup communicates with only two ones around it to form a ring structure. Therefore, the optimal information is passed throughout the whole subgroups, as shown in Figure 3b.

Diffusion model
Each subgroup communicates with the others, and the global information is transmitted by broadcasting, as shown in Figure 3c.

Hybrid model
It is a hybrid model of migration and diffusion. Each subgroup communicates with only four subgroups around it, as shown in Figure 3d.

Communication Strategies
Chang et al. [16] proposes three communication strategies to improve the solution quality by the correlations of subgroups.

•
Parameters with loosely correlated If the subgroups are not closely related or independent, they develop independently; but after m iterations, the worst t of each subgroup are eliminated, as shown in Figure 4a.

•
Parameters with strongly correlated If the subgroups are strongly related, they communicate after m iterations, and the worst t of each subgroup are replaced by the best t of the others, as shown in Figure 4b.

•
Parameters with unknown correlation (Hybrid) If we do not know the correlations between the subgroups, we take a hybrid strategy based on the two strategies, as shown in Figure 4c.

Novel Parallel Heterogeneous Algorithm
Meta-heuristic algorithm tends to fall into the trap of premature convergence, because it is possible that the fitness of an individual greatly exceeds the average of the population, which makes the individual be rapidly replicated and propagated in the population. It leads to a decline in the diversity of the population and a loss the evolutionary capacity. In parallel algorithms, if a subgroup falls into the premature trap and it does not affect the ability of others to find the optimal solution, it is a useful way. In the next section, we propose a novel parallel heterogeneous algorithm and four new communication strategies to improve the solution quality.

The Model of Parallel Heterogeneous Algorithm
In the traditional population-based parallelization, each subgroup adopts the same meta-heuristic in the parallel algorithm. Although it avoids the premature trap, it has common defects owing to the subgroups using the same algorithm. For example, it is not high of the search accuracy of PSO; GWO has slow convergence in the late and it is lack of the necessary information exchange between the pack.
The subgroups adopt different algorithms in parallel heterogeneous proposed by this paper. So if the parallel algorithm adopts the migration model, its corresponding model is shown in Figure 5. Subgroup 1 adopts meta-heuristic 1 and subgroup 2 uses meta-heuristic 2, and so on. In the parallel heterogeneous model, subgroups use different algorithms. Although they do not avoid the inherent defects of the algorithms, they overcome the defects through parallelization, and they even be greatly improved by proper communication strategies. Parallel heterogeneous model takes advantage of the characteristics of different algorithms and balances their defects. It solves a variety of problems and searches for the optimal solution in the space by various methods, but it is difficult to coordinate the subgroups because of different algorithms with diverse parameters and models. In the following section, it describes how to achieve information exchange between subgroups.

New Communication Strategies
Even if some of the subgroups stagnate in local optimum, a parallel algorithm also has a chance to acquire the global optimum. This is because information exchange can change the distribution of subgroups. So a good communication strategy undertakes the algorithm converge quickly and avoids falling into local optimum.

Communication Strategy with Ranking
The worst individuals are replaced by the best ones of other subgroups. This greatly advances the fitness of a subgroup. The subgroup is strongly influenced by its neighbors, that is, it is improved by more experienced neighbors, but it is also degenerated by inexperienced neighbors. After the replacement is complete, it ranks its neighbors. The ranking equation is described as follows.
wheref is the average fitness of the subgroup;f is the new average value of the subgroup. After m iterations, the worst individuals of each subgroup are eliminated and its neighbors receive the ranking. Then they judge whether to mutate or not by the ranking, as shown in Figure 6. The strategy estimates the subgroups and accelerates the evolution of the ones with poor convergence.

Communication Strategy with Combination
In the strategy, subgroups communicate with others according to the similarities and differences of the meta-heuristics. Subgroups which use the same algorithm have the same parameters and models, so their solutions are comparable. They are merged, sorted and then allocated to each subgroup in turn. After communication, the solutions of subgroups become random, and there are not large differences between each subgroup. The strategy widely finds the optimal solution in the search space and avoids falling into local optimum, as shown in Figure 7.
By reallocating members, the strategy makes the subgroups jump out of local trap, and they explore in more space, so that the convergences of them remain roughly the same. The pseudo code of combination is described in Algorithm 1.

Communication Strategy with Dynamic Change
Individuals are affected by the best fitness of individual in the population, which causes the individuals to move and quickly converge at an optimum in the search space. Through adding perturbations, increasing search space and maintaining the population diversity, the population is prevented from falling into local optimum. The diversity of population is expressed by the distribution of fitness values in the population.
where n is number of the population; f i is the fitness of the i th individual; andf is the average of the population.
With the evolution of the algorithm, if it decreases in the diversity of the population, it is necessary to add dynamically individuals to the population; if the diversity increases, it is reduced by appropriately decreasing individuals. Increasing members avoids falling into local optimum through searching in wide space; while decreasing members speeds up the convergence. However, this will reduce the convergence rate of the population, so a virtual population is set up, which is composed of the optimal solutions of the subgroups. They do not guide the search direction of the subgroups, but they search in the known potential optimal space. The members of the virtual population are updated once every m iterations. Its pseudo code is described in Algorithm 2.
The communication strategy effectively maintains the diversity of the population and finds a better balance between exploration and exploitation, as shown in the Figure 8.

Hybrid Communication Strategy
The hybrid strategy adopts a combination for the subgroups with the same algorithms, and uses ranking for the subgroups having different algorithms, so it has the advantages of two strategies. It keeps the diversity differences of the subgroups, and provides the coordination of all ones. It promotes them with poor search abilities, so that the population searches quickly in the space and avoids falling into local optimum. Suppose there are four subgroups, subgroups 1 and 2 adopt meta-heuristic 1, subgroups 3 and 4 adopt meta-heuristic n. After m iterations, the population executes communication strategy with ranking. After t iterations, the population implements communication strategy with combination, as shown in the Figure 9.

Meta-heuristic_1
Compute the diversity of subgroup Dynamically change the number of subgroup Get the best solutions of the subgroups

Experimental Results and Analysis
In this section, we use 28 benchmark functions to test the effectiveness of the proposed parallel heterogeneous model and communication strategies. The functions, listed in Tables 1-4, have the classifications of unimodal, multimodal, fixed-dimension and composite problems. Space is the boundary of its search range; D im represents the dimension of the function and f min indicates the optimum. Table 1. Unimodal benchmark functions.

Function
Space

Function
Space

Parameters Configuration
For verifying the results, we compare them with PGWO, which is a parallelized GWO that the poorer agents in a subgroup are replaced by the best agents of its neighbor. The parallel heterogeneous model uses the algorithms of GWO and PSO. They run 30 times and 500 iterations on each benchmark function. Each subgroup has 30 individuals. They have 4 subgroups and replace the three worst individuals every four iterations. Subgroups 1 and 2 adopt GWO; subgroups 3 and 4 adopt PSO. Table 5 lists the parameters of PSO and other parameters. Where beta is a scaling factor; pcR is a mutant constant and group_size is the number of subgroup. Table 5. Parameters setting of algorithms.
PH-H Hybrid of Ranking and Combination V max = 6; w max = 0.9; c1 = 2; c2 = 2; beta ∈ [0.02, 0.08]; pcR = 0.01 Table 6 shows the final solution of each function, and various statistical measures from average (AVG) and standard deviation (STD) show that the proposed algorithms outperform PGWO in the unimodal, multimodal and composite functions and they have the competitive ability. Figures 10-13 demonstrate the solution qualities and speeds of the benchmark functions. The X-axis represents the iteration numbers and the Y-axis denotes the corresponding fitness.

Unimodal Functions
f 1 to f 6 are unimodal functions. They have no local solution, and there is only one global solution. So they are usually used to examine the convergence rates of algorithms. From Table 6 and Figure 10, the proposed algorithms perform better than PGWO except for f 6 . It shows that they utilize the advantages of different meta-heuristics. They not only converge quickly, but also quickly find the global optimum. PH-D does better in most functions, which means that the virtual population and the diversity quickly find the optimal solutions in the unimodal functions.

Multimodal Functions
f 7 to f 12 are multimodal functions. They have many local optimum and almost are most difficult to find the global optimal solution. From Table 6 and Figure 11, the algorithms perform well in the convergence rates except for f 7 . They have the abilities of escaping from local optimum and seeking out a near-global optimal solution. PH-C does better, which shows that maintaining population evolution is helpful in solving multi-dimensional problems.       Table 6 and Figure 12, the algorithms have good performance; in particular, PH-C performs better in most functions. In the fixed-dimension multimodal functions, the convergence of the algorithm is guaranteed by keeping the diversity differences within the population and the limited differences among different populations.

Composite Multimodal Functions
f 23 to f 28 are composite multimodal functions. They have extremely complex structures with many randomly located global optimum and several randomly located deep local optimum. From Table 6 and Figure 13, they are not very good, but PGWO falls into local trap in f 26 and f 28 , while they do not. Due to the complex shape of the composition functions, it is difficult to get the accurate results from the functions. However, we conclude that PH-R and PH-H get relatively good results. In particular, in f 23 and f 27 they almost find the optimal solutions.  From the above discussion, the proposed algorithms prefer exploration at early stage and then gradually lessen their exploration rates to perform exploitation. In the later stage, they exploit the search space to find the optimal solution. Hence, it is adequate from the convergence curves that they improve the abilities of exploration and exploitation.

Application for Wind Power Forecasting
The prediction for wind power has important significance and practical values for the reasonable dispatch of wind power, reducing grid operation and maintenance costs, ensuring the reliability of power system and improving the economic and safety of wind turbine operation. In this section, we present the model for wind power prediction based on hybrid neural network and achieves the prediction.

The Model of Wind Power Forecasting Based on Hybrid Neural Network
Wind power brings great convenience to people because of its environmental protection, clean, renewable and other advantages. However, the shortcomings of wind power have an impact on the stability of the power system. Through the prediction for wind power, the power plan can be designedly arranged to avoid large fluctuations in the power grid.
Neural network (NN) is a hot research topic in the field of artificial intelligence. It has been successfully applied in many applications, such as engineering control, online learning and classification etc. [46]. NN has an input layer, one or more hidden layers and an output layer. It receives data from the input layer and outputs data through the output layer. Figure 14 shows the classical structure of the three layers network.  Where w and w are the weights of neural network. The back propagation (BP) algorithm is usually to train the parameters of the network. It is a kind of supervised learning and requires labeled data. Its working principle is to adjust the parameters of the network by measuring variance and gradient descent. However, it has some inherent defects, such as easy falling into local minima, low precision, and learning speed. Meta-heuristics have been widely used in training NN because they can find global solution in the multi-dimensional search space. So we use the proposed methods to train the network and finish the prediction for wind power based on NWP. The model of the training parameters is shown in Figure 15.

Simulation Results
NWP data is acquired from Inner Mongolia wind farm every 10 min, and there is 600 to 1320 kw for the rated output power of wind turbine. Each set of data includes wind speed, wind direction sine, wind direction cosine, air pressure, temperature, humidity and density at different heights in multiple areas. Since there is a huge amount of data per day, adding too much data will reduce the generalization ability of the model. First, the incomplete data is removed, and then cluster analysis [47] is used to search for the samples that are the most similar to the predicted NWP data in the historical data. Figure 16 shows the chart of wind power forecasting system. ...

Wind turbine_1
Wind turbine_2 Wind turbine_n Official weather service

Monitoring terminal
Data cleaning Prediction Figure 16. Wind power forecasting system.

Data Preprocessing
We use the data 1 January 2015 to 31 August, where 210 days of data is randomly selected to train the model and then 30 days of data is used to predict. Because NN is sensitive to the data between [−1, 1]. At the input layer, the data is converted to the value of [−1, 1] by the following equation. d = d − (d max + d min )/2 (d max + d min )/2 (13) where d is the current value; d is the converted result; d max and d min respectively represent the maximum and minimum values. The fitness is used to judge the performance of the NN. In this paper, we use the mean squared error to implement a better prediction.
where n is the number of prediction data; y indicates the actual result andŷ presents its corresponding prediction. The range of f is limited to [0, 1].

The Evaluation Performance of Hybrid Model
The parameters of the algorithms are the same as Section 4.1. Their prediction accuracy is shown in the Table 7, where NN is the classical neural network.  Table 7 and Figures 17 and 18, it is high for the prediction accuracy of PH-R. The fluctuation trend of PH-R is closer to the real results, especially between the samples of 6th, 16th, 23rd and 29th. In general, when the actual wind power is low and the change is relatively stable, the prediction results are very close to the actual power; while the actual power drastically changes, they predict large deviations. Some optimization methods [48][49][50][51] may be adopted to further improve the efficiency of the proposed scheme in the future work.

Conclusions
Meta-heuristics use exploration and exploitation to find the optimal solution. The purpose of exploration is to locate promising areas within the search space and exploitation finds the optimal solution in the found promising areas. The key aspect of the algorithm is its ability to preserve the balance between exploration and exploitation during the optimization process. Because the parallel heterogeneous algorithm uses different meta-heuristics, it has certain advantages. Through using communication strategies, it improves the solution quality of the algorithm. We propose four strategies for keeping the population diversity and improving the convergence rate. It is very difficult to accurately predict wind power because of the influence of the natural environment. The algorithms train the neural network to implement the prediction. Simulation results show that they obtain a good result and reduce the error of prediction. There are many meta-heuristics, but in this paper we only use two algorithms. In the future, we can use more to get better performance in different fields and build more complex models to improve the accuracy of prediction.