Ultra-Short-Term Wind-Power Forecasting Based on the Weighted Random Forest Optimized by the Niche Immune Lion Algorithm

: The continuous increase in energy consumption has made the potential of wind-power generation tremendous. However, the obvious intermittency and randomness of wind speed results in the ﬂuctuation of the output power in a wind farm, seriously affecting the power quality. Therefore, the accurate prediction of wind power in advance can improve the ability of wind-power integration and enhance the reliability of the power system. In this paper, a model of wavelet decomposition (WD) and weighted random forest (WRF) optimized by the niche immune lion algorithm (NILA-WRF) is presented for ultra-short-term wind power prediction. Firstly, the original serials of wind speed and power are decomposed into several sub-serials by WD because the original serials have no obvious day characteristics. Then, the model parameters are set and the model trained with the sub-serials of wind speed and wind power decomposed. Finally, the WD-NILA-WRF model is used to predict the wind power of the relative sub-serials and the result is reconstructed to obtain the ﬁnal prediction result. The WD-NILA-WRF model combines the advantage of each single model, which uses WD for signal de-noising, and uses the niche immune lion algorithm (NILA) to improve the model’s optimization efﬁciency. In this paper, two empirical analyses are carried out to prove the accuracy of the model, and the experimental results verify the proposed model’s validity and superiority compared with the back propagation neural network (BP neural network), support vector machine (SVM), RF and NILA-RF, indicating that the proposed method is superior in cases inﬂuenced by noise and unstable factors, and possesses an excellent generalization ability and robustness.


Introduction
With the development of the social economy, the contradiction between increasing energy consumption and limited energy reserves has gradually intensified.To solve the problem, many countries are exploring new sources of energy vigorously to deal with a serious energy crisis all around the world [1].Wind power, as one of the promising and sophisticated new sources of energy power generation, is clean and pollution-free, which can effectively mitigate climate change, improve energy security and promote low-carbon economic growth, resulting in great commercial development prospects.At present, most countries are increasing their research into wind-power generation and related technologies to promote renewable-energy power sustainable development [2].
Under the dual pressures of energy security and environmental protection, the scale of wind power is growing rapidly.According to relevant statistics, by the end of 2016 the cumulative installed Energies 2018, 11, 1098 2 of 21 capacity of wind power in the world reached 486,749 MW, with a total year-on-year increase of 12.5%.So far, China has become the world's largest and fastest-growing market for wind-power generation, accounting for 34.7% of the total installed capacity of wind power in the world, followed by the United States, Germany, India, Spain and other countries [3].
However, wind power is intermittent, stochastic and uncontrollable, which has a negative effect on the stable operation of a power system.As the scale of wind farms is expanding, the integration of wind power into the power grid will make the scheduling and operating of the electric power that is dispatched more difficult [4,5].Accurate wind-power forecasting can also be adapted to relieve the pressure of a peak load regulation system and improve the capability of the wind-power connected grid effectively [6].Therefore, forecasting wind power is of great significance.
The prediction of wind power can be divided into ultra-short-term prediction, short-term prediction, medium-term prediction and long-term prediction by time scale [7].Current research mainly focuses on two aspects: ultra-short-term prediction and short-term prediction, considering their economic significance and practical value.Meanwhile, research methods can be divided into the physical method, statistical method and machine-learning method.
The basic idea of the physical method is to construct a wind power curve to fit the actual output power by taking the meteorological information obtained from numerical weather forecasting and other physical information as the independent variables [8,9].Based on the fitting curve, this kind of method is suitable for cases with no historical data, e.g., new wind farms.However, this kind of method requires more comprehensive information collection and higher accuracy of data, which is not suitable for short-term prediction.
From a statistical point of view, the statistical method is mainly to build a functional model to predict future wind power according to the variation of historical measurements, e.g., Auto-Regressive and Moving Average Model (ARMA) [10], Autoregressive Integrated Moving Average Model (ARIMA) [11,12], particle filter [13], Markov chain [14], regression analysis [15,16], persistent model (PER) [17] and so on.Among them, time-series models and regression analysis models are more commonly used, which perform well on stationary data, and exhibit strong non-stationary characteristics due to the obvious periodicity and randomness of wind energy.Compared with the physical method, the statistical method is relatively simple with no specific evolving process of wind speed needed, and possesses higher prediction accuracy.However, this method needs to establish a strict functional relationship between input and output, so it has significant limitations.
The machine-learning algorithm has been developed to train a model and analyze data through a batch of known training data, and then predicts the new sample data with the trained model.This kind of algorithm mainly includes a neural network [18,19], support vector machine (SVM) [20,21], grey prediction [22], deep learning [23] and so on.Such an algorithm, with considerable adaptability, can improve the performance of the model and deal with the complex optimization problems better by constantly self-correcting.However, this method requires a large amount of data, and the calculation process of this kind of method is more complicated.Therefore, there may be over-fitting, poor robustness and other shortcomings in this kind of method.
The neural network is more common, with the ability of strong self-learning and seeking optimal solutions at high-speed, including feed-forward neural networks [24] and feed-back neural networks.To enhance the performance of neural networks, many methods are used to optimize them [25,26].For example, Huang D Z et al. [27] used the genetic algorithm (GA) to optimize and simulate the initial weights and threshold values of the BP neural network.The result shows that this method has higher accuracy than the BP artificial neural network.Wang C et al. [28] proposed a wind-power forecasting method based on chaos theory and Bernstein's neural network (BNN), and the results confirm the validity of this method.In addition, Wang et al. [29] considered the optimization criterion of interval information of prediction and optimized the BP neural network using the improved particle swarm optimization (PSO) algorithm.Simulation results show that the model can effectively predict the output power range.Although the prediction accuracy of the neural network algorithm has been improved, this kind of algorithm is only suitable for large sample data, and the calculation is more complex, so the SVM has received a more favorable reception among scholars because of its generalization ability, robustness and other advantages.
In order to improve prediction accuracy, PSO, bat algorithm and other algorithms are used to improve the SVM.For example, Wu Q et al. [30] proposed a new hybrid wind-power prediction model based on a cloud-based evolutionary algorithm (CBEA) and least squares support vector machine (LS-SVM), and the experimental results show that the predictive performance of the model is better than that of single LS-SVM model.Meanwhile, Wu Q et al. [31] constructed a wind-power generation forecasting model combining a LS-SVM with integrated empirical mode decomposition, principal component analysis, and bat algorithm.The analysis shows that the model is superior to other single or mixed models.
To improve the algorithm, we can combine different prediction models to compensate for the shortcomings of each model and reduce the error of a single model [32,33].For example, Wu J et al. [34] combined the two basic models of the BP neural network and SVM, and introduced PSO and cross validation to optimize the parameters of the BP neural network and SVM, which effectively solved the problem of large prediction errors in different scenarios.At the same time, Zhang Y G et al. [35] proposed a prediction model that combines the artificial neural network model with the grey model, which uses the advantages of the artificial neural network and the GM model to reduce the prediction error.
Although the performance of the above algorithms has been greatly improved, there still exist many problems such as complicated optimization process and slow convergence.The essence of the random forest (RF) algorithm is the multi-decision tree model which makes use of the combination of multiple decision trees to make a prediction.This algorithm has a controllable generalization error, fast convergence rate and fewer adjustment parameters, which can effectively avoid the over-fitting phenomenon.However there also exists a difficulty in determining the parameters of the model.
Therefore, the weighted random forest (WRF) optimized by the niche immune lion algorithm (WD-NILA-WRF) model is proposed in this paper for predicting wind power.Considering the fluctuation and randomness of wind-power generation, firstly, the serials of wind speed and wind power are decomposed into sub-serials with different frequencies by using the wavelet decomposition method, and then the NILA is used to optimize the parameters of the RF to improve the accuracy of the prediction model.Finally the constructed model is used to predict each sub-serial, and the prediction results are reconstructed to obtain the final prediction results.The WD-NILA-WRF model can reduce the influence of noise when weighting the decision trees, and it can solve the problem that the RF model is easy to over-fit to excessive noise.Empirical analysis shows that the model has a higher prediction accuracy compared with other models.
The content of the paper is arranged as follows: Section 1 reviews the current status of wind-power forecasting; Section 2 outlines the theory of the weighted RF, NILA and its working principle; Section 3 introduces the wavelet decomposition of raw data, and explains the operation process of the combined model; Section 4 uses the model constructed in this paper to conduct two example analyses; Section 5 analyzes the errors of the predicted results; and Section 6 summarizes the conclusions of the study.

CART Decision Tree
The Classification and Regression Tree (CART) decision tree algorithm is a binary recursive segmentation algorithm proposed by Breiman L et al. in 1984 which divides the current sample set into two subsets at every node except leaf nodes.When the dependent variable is a continuous variable, the decision tree is a regression tree.The split rules adopted by the CART algorithm affect the accuracy of the decision tree, so the variance reduction method is used to split the node.In the event of node splitting, the variance reduction uses the variance formula to pick the optimal splitting and the splitting with the least variance as the criterion of splitting the population.To calculate the variance of each node, the variance is calculated as: where, X represents the average data, and n represents the number of data.
In the process of generating the decision tree, according to the principle of minimum variance, it is necessary to split continuously from top to bottom, thus generating classification rules.The evaluation criterion of the classification effect in Attribute A is as follows: where: Var represents the variance, VarLefts represents the left sub-node variance, and VarRight represents the right sub-node variance.
Through training, a prediction function Γ(X, T n ) based on training set T n is established.X represents the input vectors and Y represents the output result.

Bagging and Random Forest
Random forest is a supervised ensemble learning algorithm, which combines the prediction of multiple uncorrelated decision trees.
The bagging (bootstrap aggregating) algorithm was introduced to the RF by Breiman in 1994, and is also called no-weight sampling.The method is to use the bootstrap resampling method to extract the sub-training sets (T 1 , T 2 , . . ., T n ) from the original set and construct a prediction function (Γ(X, T 1 ), Γ(X, T 2 ), . . ., Γ(X, T n )) for each decision tree.The final prediction result of the RF is the aggregation of each tree's prediction result.
First, a random training set is generated to construct a decision tree.In the process of generating a decision tree, it is necessary to select m attributes from all M decision attributes through node splitting to participate in the attributes' comparison of node splitting.m attributes are called random characteristic variables, and their generation methods generally are Forestes-Random Index (Forestes-RI) which is as follows: where, m is the number of random characteristic variables, and M is the number of all decision attributes.Then, the RF is formed and the algorithm is executed.A RF can be formed by the repeated generation of a large number of decision trees using the above method.A schematic diagram of the RF algorithm is shown as Figure 1.
We can adopt the average of all trees as the final prediction result when aggregating the output of each tree.Therefore, the estimate results relative to the input vector are as follows: Energies 2018, 11, 1098 5 of 21 Finally, use out-of-bag error (OOBE) to determine the error.OOBE is calculated through the out-of-bag that is not sampled when bootstrap aggregating, and it can be used to evaluate the generalization capacity of the model: where, Y l represents real value and Ŷl represents estimated value.
Energies 2018, 11, x 5 of 21 Finally, use out-of-bag error (OOBE) to determine the error.OOBE is calculated through the out-of-bag that is not sampled when bootstrap aggregating, and it can be used to evaluate the generalization capacity of the model:

Weighted Random Forest Model
In this paper, the training samples are divided into traditional training samples and pre-prediction samples.After the training is completed, each decision tree is tested and the correct prediction rate is calculated as follows: where, ˆl Y represents the estimated value, and l Y represents the real value.In order to reduce the impact caused by the poor training effect of the decision trees on the RF model, the prediction results are weighted by the correct rate of each decision tree, and the WRF model is obtained as follows:

Weighted Random Forest Model
In this paper, the training samples are divided into traditional training samples and pre-prediction samples.After the training is completed, each decision tree is tested and the correct prediction rate is calculated as follows: where, Ŷl represents the estimated value, and Y l represents the real value.In order to reduce the impact caused by the poor training effect of the decision trees on the RF model, the prediction results are weighted by the correct rate of each decision tree, and the WRF model is obtained as follows: Energies 2018, 11, 1098 6 of 21

Lion Algorithm
The LA is a bionic algorithm based on the social behaviors of the pride, proposed by Rajakumar in 2012, which achieves the iteration and generation of the best solution through the evolution of the resident lions or the defense of the resident males to nomad males.Each lion represents a solution and the optimal solution will be produced by optimizing solutions through the changes in the pride.
The main steps of the LA include generation of the initial pride, mating, territorial defense, and territorial takeover.In this algorithm, the optimal solution is finally obtained by iterating and searching the objective function, and the objective function is as follows: Step 1: Generation of the Initial Pride In the initial phase of the algorithm, lion groups were initialized as 2n lions and equally divided into two groups, resulting in a candidate pride.Among them, male lion structure is . ., α m r , and female lion structure is , r is the length of the solution vector.

Step 2: Mating
Mating is an effective way to iterate and search for optimal solutions through generating new solutions in existing solutions, which can update lions and maintain the stability of the pride through crossing, mutation, clustering, killing sick and weak cubs, etc.
The mating step introduces a two-probability-based crossover (crossing in two different probabilities), and it is shown in Figure 2.
Energies 2018, 11, x 6 of 21 The LA is a bionic algorithm based on the social behaviors of the pride, proposed by Rajakumar in 2012, which achieves the iteration and generation of the best solution through the evolution of the resident lions or the defense of the resident males to nomad males.Each lion represents a solution and the optimal solution will be produced by optimizing solutions through the changes in the pride.
The main steps of the LA include generation of the initial pride, mating, territorial defense, and territorial takeover.In this algorithm, the optimal solution is finally obtained by iterating and searching the objective function, and the objective function is as follows: 12 min ( , ,..., ),( 1) Step 1: Generation of the Initial Pride In the initial phase of the algorithm, lion groups were initialized as 2n lions and equally divided into two groups, resulting in a candidate pride.Among them, male lion structure is   , and female lion structure is r is the length of the solution vector.

Step 2: Mating
Mating is an effective way to iterate and search for optimal solutions through generating new solutions in existing solutions, which can update lions and maintain the stability of the pride through crossing, mutation, clustering, killing sick and weak cubs, etc.
The mating step introduces a two-probability-based crossover (crossing in two different probabilities), and it is shown in Figure 2.  The mutation operation is to randomly mutate to produce cubs 5~8 cub A at probability y p .The number of cubs was 8 after the crossover and mutation were completed.
Clustering is to use K-means approach to divide the existing 8 solutions into two groups: male cubs ( Finally, a weak cub in the bigger group will be killed after testing the state of health (or goal) to ensure that the number of cubs in the two groups will be balanced and the pride will be renewed.The age of the cub is initialized to zero after the pride update is complete.
Step 3: Territorial Defense X male X male X female X female A m and A f generated in initial pride produce new cubs Four cubs A cub 1∼4 can be generated by randomly crossing two crossing points (α m i and α f j ) with dual probabilities.
The mutation operation is to randomly mutate to produce cubs A cub 5∼8 at probability y p.The number of cubs was 8 after the crossover and mutation were completed.
Clustering is to use K-means approach to divide the existing 8 solutions into two groups: male cubs (A m_cub ) female cubs (A f _cub ).
Finally, a weak cub in the bigger group will be killed after testing the state of health (or goal) to ensure that the number of cubs in the two groups will be balanced and the pride will be renewed.The age of the cub is initialized to zero after the pride update is complete.

Step 3: Territorial Defense
In the breeding process of the pride, the pride will be attacked by the nomad males.At this point, the male lion will launch a defensive attack on the nomad lions in order to protect the cubs, and continue to occupy the territory.The process is shown in Figure 3.
In the breeding process of the pride, the pride will be attacked by the nomad males.At this point, the male lion will launch a defensive attack on the nomad lions in order to protect the cubs, and continue to occupy the territory.The process is shown in Figure 3.
In the process of territorial defense, the nomad lion nomad  will be generated using the generation of lion territory and will use the new solution ( nomad  ) to attack the male lion ( m i  ).If the new solution is better, it will be compared with the solution of the pride, and then this solution ( nomad  ) will replace the original lion ( m i  ).The new lions will continue mating, and the original lions and cubs will be killed.Otherwise, the lions will continue their territorial defense, and the cubs will grow up to be one year old until their own cubs become mature.is the strength of the entire pride that can be calculated as: 1) where, g( )  and g( )  is the strength of male and female lions respectively, _ g( ) is the strength of male and female cubs respectively, _ m cub  represents the number of male cubs in the pride and mat age is the maturity age for mating.

Step 4: Territorial Takeover
During the territorial takeover phase, the best solutions among female and male lions are found to replace inferior solutions and mate until the termination conditions are met.First of all, carry on the replacement process with the following criteria: Choose the best male lion ( m best  ) and the best female lion ( f best  ) according to the above criteria.
Let  be the number of breeding of f best  , and strenth  is the optimal breeding strength of the female lion, which usually is set to 5 and is gradually increased by 1 as the lions' mating behavior ( In the process of territorial defense, the nomad lion α nomad will be generated using the generation of lion territory and will use the new solution (α nomad ) to attack the male lion (α m i ).If the new solution is better, it will be compared with the solution of the pride, and then this solution (α nomad ) will replace the original lion (α m i ).The new lions will continue mating, and the original lions and cubs will be killed.Otherwise, the lions will continue their territorial defense, and the cubs will grow up to be one year old until their own cubs become mature.Suppose g(.) is the objective function value, g(α pride ) is the strength of the entire pride that can be calculated as: where, g(α m ) and g(α f ) is the strength of male and female lions respectively, g(α m_cub k ) and g(α ) is the strength of male and female cubs respectively, α m_cub represents the number of male cubs in the pride and age mat is the maturity age for mating.

Step 4: Territorial Takeover
During the territorial takeover phase, the best solutions among female and male lions are found to replace inferior solutions and mate until the termination conditions are met.First of all, carry on the replacement process with the following criteria: Choose the best male lion (α m best ) and the best female lion (α f best ) according to the above criteria.Let η be the number of breeding of α f best , and η strenth is the optimal breeding strength of the female lion, which usually is set to 5 and is gradually increased by 1 as the lions' mating behavior (η strenth is set to 0 when the pride is initialized).If a female lion is replaced, η will be initialized to 0, and if the original female lion is replaced back, η will be added on the original basis.After completing the above steps, return to step 2 until the terminate condition is met.GEN is the genetic algebra of the pride, and the whole process is repeated iteratively until the maximum genetic algebra GEN max .Finally, a best lion from the pride is selected as the optimal solution.

Niche Immune Lion Algorithm
The LA is a kind of parallel search method that does not depend on specific problems which has the characteristics of self-adaptability, group search, heuristic random search, etc.However, the LA also has some obvious shortcomings, and it can be improved as follows: (1) The initial population of the LA is randomly generated, with the result that the iterative generation of an optimal solution takes a long time and with low efficiency.Therefore, this paper introduces the niche algorithm to the LA to ensure the diversity of the initial pride.(2) After several iterations, individuals with high fitness in the pride will form "inbreeding", resulting in premature reduction of diversity.In response to this problem, immune factors in the immune algorithm are used to generate a better initial population to improve the efficiency of iteration.

Weighted Random Forest Optimized by Niche Immune Lion Algorithm
Although the WRF model has certain improvements over the RF model, the classification and prediction accuracy still need to be further improved, and the continuous variables need to be discretized.Moreover, the pruning threshold ε, the number of decision trees L, the number of pre-prediction samples y and the number of random characteristic variables m in WRF algorithm have a certain influence on the output of the whole model.Against the situation mentioned above, this paper uses the NILA to iteratively optimize the WRF model, improving the classification and predictive performance of the model.The analysis process of the NILA-WRF model is shown in Figure 4.
Energies 2018, 11, x 8 of 21 strenth  is set to 0 when the pride is initialized).If a female lion is replaced,  will be initialized to 0, and if the original female lion is replaced back,  will be added on the original basis.
After completing the above steps, return to step 2 until the terminate condition is met.GEN is the genetic algebra of the pride, and the whole process is repeated iteratively until the maximum genetic algebra max GEN .Finally, a best lion from the pride is selected as the optimal solution.

Niche Immune Lion Algorithm
The LA is a kind of parallel search method that does not depend on specific problems which has the characteristics of self-adaptability, group search, heuristic random search, etc.However, the LA also has some obvious shortcomings, and it can be improved as follows: (1) The initial population of the LA is randomly generated, with the result that the iterative generation of an optimal solution takes a long time and with low efficiency.Therefore, this paper introduces the niche algorithm to the LA to ensure the diversity of the initial pride.
(2) After several iterations, individuals with high fitness in the pride will form "inbreeding", resulting in premature reduction of diversity.In response to this problem, immune factors in the immune algorithm are used to generate a better initial population to improve the efficiency of iteration.

Weighted Random Forest Optimized by Niche Immune Lion Algorithm
Although the WRF model has certain improvements over the RF model, the classification and prediction accuracy still need to be further improved, and the continuous variables need to be discretized.Moreover, the pruning threshold  , the number of decision trees L , the number of The steps of the NILA-WRF model are as follows: Step1: randomly set the model parameters (the pruning threshold ε, the number of decision trees L, the number of pre-prediction samples y and the number of random characteristic variables m) and get the initial values.The bootstrap algorithm is adopted to generate L training sample sets for L decision trees and select y pre-test samples in each training set.L decision trees are generated respectively from the remaining samples of each training set.According to the comparison between the number of samples and the threshold ε, determine the leaf node, and select its mode of the target attribute as the classification results of the decision tree.Step2: after all the decision trees are trained, the weight values are calculated through a pre-test.
Step3: calculate the final result of the model based on the WRF model.Step4: take the final result as the fitness value, and optimize the corresponding model parameters using the NILA to obtain better model parameters.

The Wind-Power Forecasting Model of Wavelet Decomposition and Weighted Random Forest
Optimized by the Niche Immune Lion Algorithm

Wavelet Decomposition and Reconstruction
Wavelet transform is a time-frequency analysis method that can decompose a complex signal into signals on different frequency bands.Considering that wind power has certain periodicity (such as the daytime characteristics), the serials of wind speed and wind power are decomposed by wavelet, and the daily characteristics of wind speed serials and wind power serials are strengthened by decomposing the original serials into high-frequency components and low-frequency components.
The wavelet generating function is a function or signal I(t)(I(t) ∈ L 2 (R)) that satisfies the following conditions: where, Î(t) is the Fourier transform of I(t).
After the wavelet-generating function is changed to a continuous wavelet function, the result is that this function depends on the scaling factor w (w > 0) and the translation parameter q (q ∈ R).The continuous wavelet function is: The continuous wavelet transform of the defined signal O(t) is as follows: where, O(t) ∈ L 2 (−∞, +∞), and I(t) is the conjugate function of I(t).
The discrete wavelet transform of the signal O(t) is defined as: In 1988, Mallat proposed the Mallat algorithm using the idea of multi-resolution analysis, which illustrated the multi-resolution characteristics of the wavelet from the perspective of a spatial image.Multi-scale wavelet decomposition is the signal through the change of scale factor that is decomposed into an approximate part and a detail part; the decomposition process shown in Figure 5.


In 1988, Mallat proposed the Mallat algorithm using the idea of multi-resolution analysis, which illustrated the multi-resolution characteristics of the wavelet from the perspective of a spatial image.Multi-scale wavelet decomposition is the signal through the change of scale factor that is decomposed into an approximate part and a detail part; the decomposition process shown in Figure 5. Any signal can be completely reconstructed from the low-frequency components with 2 J  resolution and the high-frequency components with 2 J  resolution, namely: Take the wind speed serial as example to carry on the wavelet transform here.Let the original wind speed serial be S.According to this algorithm, the non-stationary wind speed serials can be decomposed into high-frequency signals

The Construction of the WD-NILA-WRF Model
Wind-power forecasting is affected by many factors, e.g., wind speed, wind direction and climate conditions, etc.In order to predict wind-power changes more accurately in the next 24 h, a new wind-power forecasting model is proposed, which is combination of the WD model and the NILA-WRF model.This model is implemented as follows: Step 1 Data Acquisition and Preprocessing Sample data, including wind power output, wind speed, wind direction, air pressure, air density, temperature, surface roughness, etc. are collected.Then data preprocessing is carried out to normalize the numerical data, and non-numerical data are coded by category feature.In the figure, A j is the approximate part (low-frequency component) of the decomposition of the jth layer, and D j is the detail part (high-frequency component) of the decomposition of the jth layer.

Step 2 Noise-Reduction Processing
Any signal can be completely reconstructed from the low-frequency components with 2 −J resolution and the high-frequency components with 2 −J resolution, namely: Take the wind speed serial as example to carry on the wavelet transform here.Let the original wind speed serial be S.According to this algorithm, the non-stationary wind speed serials can be decomposed into high-frequency signals d 1 , d 2 , d 3 , . . ., d J with different frequencies and a low-frequency signal a J , and J is the maximum decomposition level.After reconstructing the signal obtained by decomposition, we can get several detail serials D 1 , D 2 , D 3 , . . ., D J and the approximate serial A J , and then:

The Construction of the WD-NILA-WRF Model
Wind-power forecasting is affected by many factors, e.g., wind speed, wind direction and climate conditions, etc.In order to predict wind-power changes more accurately in the next 24 h, a new wind-power forecasting model is proposed, which is combination of the WD model and the NILA-WRF model.This model is implemented as follows: Step 1 Data Acquisition and Preprocessing Sample data, including wind power output, wind speed, wind direction, air pressure, air density, temperature, surface roughness, etc. are collected.Then data preprocessing is carried out to normalize the numerical data, and non-numerical data are coded by category feature.

Step 2 Noise-Reduction Processing
The original serials of wind speed and wind power are decomposed by wavelet to get several detail serials with different frequencies and one approximate serial.

Step 3 Model Training
The serial of each layer after decomposition is used as an input vector to train the WD-NILA-WRF model.

Step 4 Wind-Power Predicting
According to the trained model, the test sample is used as the data source, and the WD-NILA-WRF model is used to predict the data separately.The final result is obtained by reconstructing the result.Forecasting process is shown as Figure 6.

Step 4 Wind-Power Predicting
According to the trained model, the test sample is used as the data source, and the WD-NILA-WRF model is used to predict the data separately.The final result is obtained by reconstructing the result.Forecasting process is shown as Figure 6.

Data Selection
Wind power is affected by many factors, which mainly include wind speed, wind direction, temperature, humidity, air pressure, surface roughness, etc.Among them, the wind speed is the most important factor that affects wind power.If the wind speed is less than the cut-in wind speed, the wind turbine cannot operate, and if the wind speed is greater than the cut-off wind speed, the wind turbine stops working.Meanwhile, the wind direction and temperature are also important influencing factors of wind power.
According to the development of wind-energy resources in China, two typical wind farms are selected to conduct empirical analysis, and ultra-short-term prediction (5 min in advance) is carried out to verify the accuracy and robustness of the model.The total installed capacity of two wind farms and related geo-climate information are shown in Table 1.

Data Selection
Wind power is affected by many factors, which mainly include wind speed, wind direction, temperature, humidity, air pressure, surface roughness, etc.Among them, the wind speed is the most important factor that affects wind power.If the wind speed is less than the cut-in wind speed, the wind turbine cannot operate, and if the wind speed is greater than the cut-off wind speed, the wind turbine stops working.Meanwhile, the wind direction and temperature are also important influencing factors of wind power.
According to the development of wind-energy resources in China, two typical wind farms are selected to conduct empirical analysis, and ultra-short-term prediction (5 min in advance) is carried out to verify the accuracy and robustness of the model.The total installed capacity of two wind farms and related geo-climate information are shown in Table 1.Case 1: related wind-power generation data in a wind farm of Inner Mongolia (awind farm A) from 21 to 26 August 2017 were selected as the analysis samples, which were sampled every 5 min and collected at 288 points daily, forming a dataset with a total of 1728 sample points for empirical analysis.Among them, the first five days' data, 1440 data, were used to train the model, and the last day's data, 288 data, were used as a test sample to test the model.The input vectors of the WD-NILA-WR model were wind speed, wind direction, temperature, relative humidity, air pressure and surface roughness, and the output vector was wind power.
Case 2: related wind power-generation data in a certain wind farm on the south-east coast of China (wind farm B) from 23 to 27 August 2017 were selected as the analysis samples, which were taken every 5 min and collected at 288 points daily, forming a dataset with a total of 1440 sample points for empirical analysis.Among them, the first 1152 data were used to train the model, and the last day's data, 288 data, were used as a test sample to test the model.The model's input vectors and output vector are the same as in case 1.

Clean Abnormal Data
The original data will be affected by various external factors during its production, and the error will reduce the prediction accuracy of wind power.The influence factors of wind power are different from unit to magnitude which will result in slow model convergence and large training error, and in order to avoid the situation caused by the impact of factors, the original data needs to be normalized.
In this paper, the extremalization method is used to normalize variables which will eliminate the dimensional effects, and the method converts the original data into the data between [0, 1] taking the maximum and minimum values of the variables as a boundary, so as to eliminate the influence of dimension and magnitude.The standardization formula is as follows: where, x i represents a certain original data, meanwhile, x max and x min , respectively, represent the maximum value and the minimum value in the original data.After the original data has been standardized, the data dimensionality can be eliminated.

Wavelet Decomposition
The serials of wind speed and wind power are generally non-stationary serials.After they are decomposed into different frequency signals by wavelet transform, the serials of different frequencies show certain regularity.Therefore, wavelet decomposition of wind-speed serials and wind-power serials makes the model proposed in this paper more predictable.This section uses the wind-speed serials as examples to perform wavelet decomposition.
In this paper, the original wind-speed serials of wind farms A and B will be decomposed into five high frequency components and one low frequency component, respectively.The decomposition results are shown in Figures 7 and 8.

Forecasting of Ultra-Short-Term Wind Power Based on WD-NILA-WRF
Accurate ultra-short-term wind-power prediction is of great significance for optimizing resource allocation and reducing the phenomena of abandoning wind power and limiting generation.In order to verify the overall applicability of the proposed algorithm, the wind-power prediction is carried out in this paper at two different farms in two different areas.

Forecasting of Ultra-Short-Term Wind Power Based on WD-NILA-WRF
Accurate ultra-short-term wind-power prediction is of great significance for optimizing resource allocation and reducing the phenomena of abandoning wind power and limiting generation.In order to verify the overall applicability of the proposed algorithm, the wind-power prediction is carried out in this paper at two different farms in two different areas.

Forecasting of Wind-Power Generation in Wind Farm A
According to the algorithm flow shown in Figure 6, this example uses the 1440 data points of the wind farm in the south-eastern coastal area from 21 to 25 August 2017 to train the model and uses the trained model to predict wind power of 288 points on 26 August 2017.
First, initialize the parameters of the NILA, and set the parameters as shown in Table 2.

Forecasting of Ultra-Short-Term Wind Power Based on WD-NILA-WRF
Accurate ultra-short-term wind-power prediction is of great significance for optimizing resource allocation and reducing the phenomena of abandoning wind power and limiting generation.In order to verify the overall applicability of the proposed algorithm, the wind-power prediction is carried out in this paper at two different farms in two different areas.

Forecasting of Wind-Power Generation in Wind Farm A
According to the algorithm flow shown in Figure 6, this example uses the 1440 data points of the wind farm in the south-eastern coastal area from 21 to 25 August 2017 to train the model and uses the trained model to predict wind power of 288 points on 26 August 2017.
First, initialize the parameters of the NILA, and set the parameters as shown in Table 2.  First, initialize the parameters of the NILA, and set the parameters as shown in Table 2.Then, we can use NILA to optimize the parameters of the WRF algorithm iteratively.For the model that has been generated, the OOB error rate is used as a measure of the generalization error to detect model performance.Analysis shows that the number of the tree is the optimal solution when the OOB error rate reaches the minimum value.Figure 9 shows that L = 500 is the optimal solution and proves that the WD-NILA-WRF model converges fast and can obtain the global optimal solution as well.Then, we can use NILA to optimize the parameters of the WRF algorithm iteratively.For the model that has been generated, the OOB error rate is used as a measure of the generalization error to detect model performance.Analysis shows that the number of the tree is the optimal solution when the OOB error rate reaches the minimum value.Figure 9 shows that L = 500 is the optimal solution and proves that the WD-NILA-WRF model converges fast and can obtain the global optimal solution as well.Therefore, the proposed model will perform better when ε = 0, L = 500, y = 500 and m = 3.Finally, the wind power of the test set is predicted based on the optimized parameters.The WD-NILA-WRF model predicts each sub-serial obtained by wavelet decomposition and reconstructs the prediction results to get the wind-power prediction results and the relative errors of the 288 data points on 26 August 2017 which is shown in Figures 10 and 11  Then, we can use NILA to optimize the parameters of the WRF algorithm iteratively.For the model that has been generated, the OOB error rate is used as a measure of the generalization error to detect model performance.Analysis shows that the number of the tree is the optimal solution when the OOB error rate reaches the minimum value.Figure 9 shows that L = 500 is the optimal solution and proves that the WD-NILA-WRF model converges fast and can obtain the global optimal solution as well.As can be seen from Figures 10 and 11, the fitting degree of the predicted curve is good and the prediction accuracy is high when the WD-NILA-WRF model is used to predict the wind power of sampling points on 26 August 2017 in Inner Mongolia.Meanwhile, the relative errors of each prediction point do not exceed 20%, and the relative errors of most of the prediction points are less than 10%, which shows that the proposed model has strong generalization and robustness when predicting wind power.
The residual is the difference between the actual value and the estimated value.If the residual value shows a regular change over time, this means that there is an autocorrelation in the residual value, otherwise there is no autocorrelation.In order to verify the validity of the model, this paper uses the D-W test to test the model residual autocorrelation.When the D-W value is closer to 2, there is no correlation between the residual values.When the D-W value is closer to 0, the positive correlation between the residual values is stronger.When the D-W value is closer to 4, the negative correlation between the residual values is stronger.After verification, the model's D-W = 2.987.It shows that the model residuals are independent and the prediction result is credible.
In this paper, the BP neural network (BP), SVM, random forest model (RF), and random forest optimized by niche immune lion algorithm (NILA-RF) are also used to predict wind power of the wind farm.The forecast results predicted by the WD-NILA-WRF were compared with those models, and the comparison results are shown in Figure 12.As can be seen from Figures 10 and 11, the fitting degree of the predicted curve is good and the prediction accuracy is high when the WD-NILA-WRF model is used to predict the wind power of sampling points on 26 August 2017 in Inner Mongolia.Meanwhile, the relative errors of each prediction point do not exceed 20%, and the relative errors of most of the prediction points are less than 10%, which shows that the proposed model has strong generalization and robustness when predicting wind power.
The residual is the difference between the actual value and the estimated value.If the residual value shows a regular change over time, this means that there is an autocorrelation in the residual value, otherwise there is no autocorrelation.In order to verify the validity of the model, this paper uses the D-W test to test the model residual autocorrelation.When the D-W value is closer to 2, there is no correlation between the residual values.When the D-W value is closer to 0, the positive correlation between the residual values is stronger.When the D-W value is closer to 4, the negative correlation between the residual values is stronger.After verification, the model's D-W = 2.987.It shows that the model residuals are independent and the prediction result is credible.
In this paper, the BP neural network (BP), SVM, random forest model (RF), and random forest optimized by niche immune lion algorithm (NILA-RF) are also used to predict wind power of the wind farm.The forecast results predicted by the WD-NILA-WRF were compared with those models, and the comparison results are shown in Figure 12.As can be seen from Figures 10 and 11, the fitting degree of the predicted curve is good and the prediction accuracy is high when the WD-NILA-WRF model is used to predict the wind power of sampling points on 26 August 2017 in Inner Mongolia.Meanwhile, the relative errors of each prediction point do not exceed 20%, and the relative errors of most of the prediction points are less than 10%, which shows that the proposed model has strong generalization and robustness when predicting wind power.
The residual is the difference between the actual value and the estimated value.If the residual value shows a regular change over time, this means that there is an autocorrelation in the residual value, otherwise there is no autocorrelation.In order to verify the validity of the model, this paper uses the D-W test to test the model residual autocorrelation.When the D-W value is closer to 2, there is no correlation between the residual values.When the D-W value is closer to 0, the positive correlation between the residual values is stronger.When the D-W value is closer to 4, the negative correlation between the residual values is stronger.After verification, the model's D-W = 2.987.It shows that the model residuals are independent and the prediction result is credible.
In this paper, the BP neural network (BP), SVM, random forest model (RF), and random forest optimized by niche immune lion algorithm (NILA-RF) are also used to predict wind power of the wind farm.The forecast results predicted by the WD-NILA-WRF were compared with those models, and the comparison results are shown in Figure 12.It can be seen that the prediction result that is produced by the WD-NILA-WRF model proposed in this paper is closer to the true value and can obtain higher prediction accuracy compared with the other five models from the diagram.Although the BP neural network can also achieve the prediction function, the fitting of the prediction curve is not high, and the prediction effect is not good.Compared with the BP neural network, SVM performs better, but the prediction result is still not ideal.With the gradual optimization of the RF model, the accuracy of prediction is gradually improved.Through the analysis of the prediction results and relative errors of 288 test points, the WD-NILA-WRF model can solve the problem whereby the single decision tree is greatly affected by noise through weighting.It can be proved that the WD-NILA-WRF model is more stable and robust in the prediction of ultra-short-term wind power compared with other models.Set the model parameters, and train the model with the obtained serials.Figure 13 shows the optimization process of the WD-NILA-WRF model, and we can ascertain that the model has a higher prediction accuracy when ε = 0, L = 500, y = 500 and m = 3.It can be seen that the prediction result that is produced by the WD-NILA-WRF model proposed in this paper is closer to the true value and can obtain higher prediction accuracy compared with the other five models from the diagram.Although the BP neural network can also achieve the prediction function, the fitting of the prediction curve is not high, and the prediction effect is not good.Compared with the BP neural network, SVM performs better, but the prediction result is still not ideal.With the gradual optimization of the RF model, the accuracy of prediction is gradually improved.Through the analysis of the prediction results and relative errors of 288 test points, the WD-NILA-WRF model can solve the problem whereby the single decision tree is greatly affected by noise through weighting.It can be proved that the WD-NILA-WRF model is more stable and robust in the prediction of ultra-short-term wind power compared with other models.

Forecasting of Wind-Power Generation in Wind Farm B
In this case, 1152 data points in a wind farm in the south-eastern coastal area from 23 to 26 August 2017 were selected as samples to train the wind power prediction model which will be used to test the 288 data points in the wind farm on 27 August 2017.
Set the model parameters, and train the model with the obtained serials.Figure 13 shows the optimization process of the WD-NILA-WRF model, and we can ascertain that the model has a higher prediction accuracy when  = 0, L = 500, y = 500 and m = 3.
After obtaining the prediction result of the serials, we should recombine the prediction results of the sub-serials, and the final prediction result is shown in Figure 14.After obtaining the prediction result of the serials, we should recombine the prediction results of the sub-serials, and the final prediction result is shown in Figure 14.It can be seen that the prediction result that is produced by the WD-NILA-WRF model proposed in this paper is closer to the true value and can obtain higher prediction accuracy compared with the other five models from the diagram.Although the BP neural network can also achieve the prediction function, the fitting of the prediction curve is not high, and the prediction effect is not good.Compared with the BP neural network, SVM performs better, but the prediction result is still not ideal.With the gradual optimization of the RF model, the accuracy of prediction is gradually improved.Through the analysis of the prediction results and relative errors of 288 test points, the WD-NILA-WRF model can solve the problem whereby the single decision tree is greatly affected by noise through weighting.It can be proved that the WD-NILA-WRF model is more stable and robust in the prediction of ultra-short-term wind power compared with other models.

Forecasting of Wind-Power Generation in Wind Farm B
In this case, 1152 data points in a wind farm in the south-eastern coastal area from 23 to 26 August 2017 were selected as samples to train the wind power prediction model which will be used to test the 288 data points in the wind farm on 27 August 2017.
Set the model parameters, and train the model with the obtained serials.Figure 13 shows the optimization process of the WD-NILA-WRF model, and we can ascertain that the model has a higher prediction accuracy when  = 0, L = 500, y = 500 and m = 3.
After obtaining the prediction result of the serials, we should recombine the prediction results of the sub-serials, and the final prediction result is shown in Figure 14.It can be seen from the Figure 14, the model proposed in this paper performs well and achieves a certain prediction accuracy on the prediction of ultra-short-term wind power.For the wind farm, the relative errors between the predicted power and the actual power is shown in Figure 15.It can be seen from the Figure 14, the model proposed in this paper performs well and achieves a certain prediction accuracy on the prediction of ultra-short-term wind power.For the wind farm, the relative errors between the predicted power and the actual power is shown in Figure 15.As we can see form the above diagram, the prediction error is between (−20%, 20%) and the relative errors of most data points are within 10%.According to the analysis of the relative errors, the model proposed in this paper is great at ultra-short-term wind-power prediction and has a wonderful performance in decreasing errors.
After verification, the model's D-W = 2.324.This shows that the model residuals are independent and the prediction results are credible.
In order to verify the superiority of the model proposed in this paper, we can predict the wind power of the wind farm using the BP neural network, SVM, RF and NILA-RF, and compare the five models' results with the prediction results produced by the WD-NILA-RF model.The comparison results are shown in Figure 16.As we can see form the above diagram, the prediction error is between (−20%, 20%) and the relative errors of most data points are within 10%.According to the analysis of the relative errors, the model proposed in this paper is great at ultra-short-term wind-power prediction and has a wonderful performance in decreasing errors.
After verification, the model's D-W = 2.324.This shows that the model residuals are independent and the prediction results are credible.
In order to verify the superiority of the model proposed in this paper, we can predict the wind power of the wind farm using the BP neural network, SVM, RF and NILA-RF, and compare the five models' results with the prediction results produced by the WD-NILA-RF model.The comparison results are shown in Figure 16.It can be seen from the Figure 14, the model proposed in this paper performs well and achieves a certain prediction accuracy on the prediction of ultra-short-term wind power.For the wind farm, the relative errors between the predicted power and the actual power is shown in Figure 15.As we can see form the above diagram, the prediction error is between (−20%, 20%) and the relative errors of most data points are within 10%.According to the analysis of the relative errors, the model proposed in this paper is great at ultra-short-term wind-power prediction and has a wonderful performance in decreasing errors.
After verification, the model's D-W = 2.324.This shows that the model residuals are independent and the prediction results are credible.
In order to verify the superiority of the model proposed in this paper, we can predict the wind power of the wind farm using the BP neural network, SVM, RF and NILA-RF, and compare the five models' results with the prediction results produced by the WD-NILA-RF model.The comparison results are shown in Figure 16.It can be seen from the figure that all the prediction models used in this paper can reach a certain level of prediction accuracy.However, the model proposed in this paper has been optimized and improved, and the forecasting accuracy is better than the other five models.In this paper, MAPE, RMSE, MAE, and R 2 are used as evaluation indicators of model performance to quantify the model prediction accuracy, as shown in Table 3 of Section 5. Through the experimental comparative analysis, it can be proved that the WD-NILA-WRF model performs better and has better robustness in ultra-short-term wind-power prediction than the other five models.Through the comparative analysis of two examples, it can be found that: (1) Due to the influence of wind speed, wind direction, temperature, humidity, air density, air pressure, ground roughness and other factors, wind-power generation shows certain randomness and volatility.Using wavelet decomposition to denoise the original data can enhance the day characteristics of wind speed and wind power, so that the model prediction accuracy can be higher.(2) After the parameters of the model were optimized by NILA, and the prediction result of each decision tree was weighted, the WD-NILA-WRF model can get faster convergence rate, avoid the "over-fitting" problem effectively, and can reduce the influence of noise that a single decision tree cannot solve.
In summary, the combined model proposed in this paper, WD-NILA-WRF, has realized the advantages of each model and reduced the error created by a single model effectively.The WD-NILA-WRF model separates the stationary signal of the original data from the non-stationary signal of the original data through decomposition and noise reduction, making the data more valuable.Then, through the improvement of NILA, the parameters of the model are optimized and the prediction accuracy is improved.The comparison results show that the prediction accuracy of the WD-NILA-WRF model is higher than that of the BP neural network, SVM, RF, NILA-RF and other models, which also proves the powerful generalization ability and robustness of the model.Therefore, the proposed WD-NILA-WRF model is suitable for ultra-short-term wind power forecasting.

Error Analysis
The error analysis is mainly to analyze the deviations produced by the required targets, which is an important part of evaluating the prediction accuracy of the model.In order to verify the general applicability of the algorithm, this paper uses mean absolute percentage error (MAPE), root mean square error (RMSE), mean absolute error (MAE), and non-linear function goodness of fit (R 2 ) to compare the prediction accuracy of each model to evaluate the predictive performance of each model more accurately.The calculation formula of each indicator is as follows:

Y
represents real value and ˆl Y represents estimated value.

Figure 1 .
Figure 1.The program of the random forest (RF) model.(a) Training process of a single decision tree.(b) Schematic diagram of RF algorithm.

Figure 1 .
Figure 1.The program of the random forest (RF) model.(a) Training process of a single decision tree.(b) Schematic diagram of RF algorithm.

Figure 2 .
Figure 2. Cross mode of the lion algorithm (LA).

A
can be generated by randomly crossing two crossing points (

Figure 2 .
Figure 2. Cross mode of the lion algorithm (LA).
d with different frequencies and a low-frequency signal J a , and J is the maximum decomposition level.After reconstructing the signal obtained by decomposition, we can get several detail serials

Figure 6 .
Figure 6.The prediction process of the WD-NILA-WRF model diagram.

Figure 6 .
Figure 6.The prediction process of the WD-NILA-WRF model diagram.

Figure 7 .
Figure 7. Wind speed serial wavelet decomposition in wind farm A.

Figure 8 .
Figure 8. Wind speed serial wavelet decomposition in wind farm B.

Figure 7 . 21 Figure 7 .
Figure 7. Wind speed serial wavelet decomposition in wind farm A.

Figure 8 .
Figure 8. Wind speed serial wavelet decomposition in wind farm B.

Figure 8 .
Figure 8. Wind speed serial wavelet decomposition in wind farm B.

Figure 12 .
Figure 12.Forecasting value of each model.

Figure 12 .
Figure 12.Forecasting value of each model.

Figure 12 .
Figure 12.Forecasting value of each model.

4. 3 . 2 .
Forecasting of Wind-Power Generation in Wind Farm B In this case, 1152 data points in a wind farm in the south-eastern coastal area from 23 to 26 August 2017 were selected as samples to train the wind power prediction model which will be used to test the 288 data points in the wind farm on 27 August 2017.

Figure 16 .
Figure 16.Forecasting value of each model.

Figure 16 .
Figure 16.Forecasting value of each model.

Figure 16 .
Figure 16.Forecasting value of each model.

Table 1 .
Related information of two cases.

Table 2 .
The main parameters of the NILA.

Table 2 .
The main parameters of the NILA.

Table 2 .
The main parameters of the NILA.

Table 3 .
Indicator calculation result table.