1. Introduction
As a clean and renewable resource, wind energy is important in energy supply and, through wind turbines, the green wind energy can be converted to electricity. However, not all locations are suitable for wind turbine installation. As a result, wind energy assessment should be performed in advance. Furthermore, to guarantee the safety of wind energy, the accuracy of wind speed forecasting should be ensured. Wind energy assessment and wind speed forecasting are two challenging research topics at present.
Wind energy assessment plays a significant role in wind turbine installation decisions in many countries worldwide, and technologies used for wind energy potential are varied. Based on different moment constraints, Liu and Chang [
1] performed validity analysis of the maximum entropy distribution for wind energy assessment in Taiwan. Nested ensemble Numerical Weather Prediction approach was proposed by AlYahyai et al. [
2] to perform a wind energy assessment over Oman. Wu et al. [
3] proposed an assessment model based on the Weibull distribution and different particle swarm optimization algorithms as well as differential evolution algorithms to assess the wind energy potential at Inner Mongolia in China. Jung and Kwon [
4] introduced artificial neural networks to improve the wind energy potential estimation for four sites surrounding the Saemangeum Seawall. The wind analysis model was adopted by Boudia et al. [
5] to assess the wind energy of four locations situated in the Algerian Sahara. Apart from the wind analysis model, Quan and Leephakpreeda [
6] also used economic analysis to assess the wind energy potential in Thailand. A GISbased method was applied by Siyal et al. [
7] for wind energy assessment in Sweden.
One of the most vital factors used for wind energy assessment is the wind speed. The effect of the wind energy assessment directly depends on the accuracy of the wind speed forecasting. Many techniques have recently been proposed to forecast the wind speed, and the related techniques can usually be divided into the following three categories: shortterm wind speed forecasting [
8,
9,
10], mediumterm wind speed forecasting [
11] and longterm wind speed forecasting. One of the most popular skills used for wind speed forecasting is to construct a hybrid model based on several single forecasting approaches. For example, Wang et al. [
12] presented a hybrid model with the assistance of the phase space reconstruction algorithm and Markov algorithm. Based on the extreme learning machine, LjungBox Qtest and seasonal autoregressive integrated moving average (ARIMA) models, a hybrid wind speed forecasting model is proposed by Wang et al. [
13] to estimate the wind speed of different sites in northwestern China. The ARIMA model was also used by Shukur and Lee [
14] to show a hybrid wind speed forecasting model with the Kalman filter and an artificial neural network. Liu et al. [
15] demonstrated a hybrid approach using the secondary decomposition model and Elman neural networks. Fei [
16] used a hybrid method that consists of the empirical mode decomposition and multiplekernel relevance vector regression technologies.
In this paper, based on the cuckoo search (CS) algorithm and ant colony (AC) algorithm, two new wind energy assessment models and six wind speed forecasting models are proposed. In the assessment process, the AC and CS algorithms are applied to optimize two unknown parameters of the Weibull distribution. Then, four assessment error evaluation criteria are adopted to evaluate the effectiveness of the two newly proposed assessment models. While in the forecasting process, the CS and AC algorithms are used to optimize three neural networks, namely the Elman, back propagation and wavelet neural networks, and the new proposed approaches are validated by three forecasting error evaluation criteria.
The remaining part of this paper is organized as follows: A description of wind energy potential assessment methodologies is given and the results are evaluated in
Section 2.
Section 3 presents the connection between the energy assessment and forecasting to identify the best data preprocessing approach. The proposed integrated forecasting framework and forecasting results are presented in
Section 4, and the last section presents the concluding remarks.
2. Wind Energy Potential Assessment Methodologies and Results
In this section, related single methodologies as well as the proposed hybrid methods used to assess the wind energy potential are introduced; then, the assessment results are presented to demonstrate the performance of the methods.
2.1. Related Methodologies
This subsection focuses on the related single and hybrid methodologies to assess the wind energy potential.
2.1.1. Related Single Methodologies
The main content of two parameter optimization algorithms and the assessment approach will be described in this section.
Parameter Optimization Algorithms
(a) Cuckoo Search Algorithm
The cuckoo search (CS) algorithm [
17] is derived from the behavior of the cuckoo in the process of searching for nests. To simplify the CS algorithm, three idealized rules are hypothesized. The first is that only one egg is laid by a cuckoo each time, and the cuckoo randomly selects a parasitic nest to hatch the egg. The second is that among the randomly selected parasitic nests, the best parasitic nest will be reserved for the next generation. The last is that the number of the available parasitic nests is fixed, and the probability of the alien egg found by the host of the parasitic nest is
${p}_{a}$, which is located in the interval
$\left[0,1\right]$. Once the alien eggs have been found, the host birds will throw them or abandon the nest, and build a new one in another place. For simplicity, we use the statement that one egg in a nest represents a solution, and the new and potentially better solutions will replace the bad ones.
On the basis of these three ideal rules, the new solution is generated by:
where
$\alpha $ is the step size and, in most cases, it is set to
$\alpha =1$; the symbol “
$\times $” represents the entrywise multiplication. In essence, Equation (1) is a random walk equation, and the future position is determined by the current positon (the first term in Equation (1)) as well as the transition probability (the second term in Equation (1)). Lévy in Equation (1) denotes the random search path, and the random step length follows the Lévy distribution shows Equation (2), i.e.,
where
$\lambda $ is set to values in the interval
$(1,\text{}3]$.
(b) Ant Colony Algorithm
The ant colony (AC) algorithm is proposed by Italian scientist Dorigo M. etc. in 1991. To facilitate the research, the following assumptions are proposed [
18]: (1) The communication mediums that ants used are the pheromone and environment; (2) The response of the ant to the environment is determined by its internal mode; (3) The ant individuals are independent; and (4) the entire ant colony shows a random characteristic.
Through adaptation and collaboration in two stages, ants transition to an ordered state from the disordered one and obtain the optimum path. The key point of path selection is the probability transition, i.e., the probability of the kth ant from the ith city to the jth city at time is calculated by the Equation (3) [
19]:
where
${\tau}_{ij}\left(t\right)$ and
${\eta}_{ik}\left(t\right)$ represent the intensity of the pheromone trail and visibility of edge
$\left(i,j\right)$, respectively;
${\mathrm{allowed}}_{k}$ is the set of cities to be visited by the
kth ant in the
Ith city, and
$\alpha $ and
$\beta $ are two coefficients that tune the relative importance of the trail versus visibility.
Assessment Approach
The Weibull distribution is introduced to this paper to assess the potential wind energy. The probability density function (PDF) of the Weibull distribution can be expressed by Equation (4):
where
$x$ is the random variable, which represents the wind speed in this paper;
$k$ and
$c$ are the shape and scale parameters, respectively.
2.1.2. Proposed Wind Energy Potential Assessment Model
In this paper, the CS algorithm is used to estimate the unknown parameters
$k$ and
$c$ in the Weibull distribution. The new proposed novel model is abbreviated as the CSWeibull model. The pseudo code of this model is presented in Algorithm 1. Similarly, the AC algorithm is adopted to estimate the two parameters. Correspondingly, this new model is abbreviated as the ACWeibull model. The pseudo code presented in Algorithm 2 is provided to help understand this novel model.
Algorithm 1: CSWeibull 
Input:${\mathit{x}}_{s}^{\left(0\right)}=\left({x}^{\left(0\right)}\left(1\right),{x}^{\left(0\right)}\left(2\right),\dots ,{x}^{\left(0\right)}\left(q\right)\right)$—a sequence of training data. ${\mathit{x}}_{p}^{\left(0\right)}=\left({x}^{\left(0\right)}\left(q+1\right),{x}^{\left(0\right)}\left(q+2\right),\dots ,{x}^{\left(0\right)}\left(q+d\right)\right)$—a sequence of verifying data

Output: 
Parameters: 
Num Cuckoos = 50;  number of initial population 
Min Number Of Eggs = 2;  minimum number of eggs for each cuckoo 
Max Number Of Eggs = 4;  maximum number of eggs for each cuckoo 
Max Iter = 200;  maximum iterations of the Cuckoo Algorithm 
Knn Cluster Num = 1;  number of clusters that we want to make 
Motion Coeff = 20;  Lambda variable in COA paper, default = 2 
accuracy = 1.0 × 10^{−10};  How much accuracy in answer is needed 
Max Num Of Cuckoos = 20;  maximum number of cuckoos that can live at the same time 
Radius Coeff = 0.05;  Control parameter of egg laying 
Cuckoo Pop Variance = 1 × 10^{−10};  Population variance that cuts the optimization 
1: /* Initialize population of n host nests x_{i} (i = 1, 2, ..., n) randomly*/ 
2: FOR EACH i: 1 ≤ i ≤ n DO 
3: Evaluate the corresponding fitness function F_{i} 
4: END FOR 
5: WHILE (g < Gen_{Max}) DO 
6: /* Get new nests by Lévy flights */ 
7: FOR EACH i: 1 ≤ i ≤ n DO 
8: x_{L} = x_{i} + α⊕Levy(λ); 
9: END FOR 
10: FOR EACH i: 1 ≤ i ≤ n DO 
11: Compute F_{L} 
12: IF (F_{L} < F_{i}) THEN 
13: x_{i}←x_{L}; 
14: END IF 
15: END FOR 
16: Compute F_{L} 
17: /*Update best nest x_{p} of the d generation*/ 
18: IF (F_{p} < F_{b}) THEN 
19: x_{b}←x_{p}; 
20: END IF 
21: END WHILE 
22: RETURN x_{b} 
Algorithm 2: ACWeibull 
Input:${\mathit{x}}_{s}^{\left(0\right)}=\left({x}^{\left(0\right)}\left(1\right),{x}^{\left(0\right)}\left(2\right),\dots ,{x}^{\left(0\right)}\left(q\right)\right)$—a sequence of training data. ${\mathit{x}}_{p}^{\left(0\right)}=\left({x}^{\left(0\right)}\left(q+1\right),{x}^{\left(0\right)}\left(q+2\right),\dots ,{x}^{\left(0\right)}\left(q+d\right)\right)$—a sequence of verifying data

Output: 
Parameters: Maximum iterations:50 The number of ant:30 Parameters of the important degree of information elements:1 Parameters of the important degree of the Heuristic factor:5 Parameters of the important degree of the heuristic factor:0.1 Pheromone increasing intensity coefficient:100 
NC_max—Maximum iterations:50 
m—The number of ant:30 
Alpha—Parameters of the important degree of information elements:1 
Beta—Parameters of the important degree of the Heuristic factor:5 
Rho—Parameters of the important degree of the heuristic factor:0.1 
Q—Pheromone increasing intensity coefficient:100 
1: /*Initialize popsize candidates with the values between 0 and 1*/ 
2: FOR EACH i: 1 ≤ i ≤ n DO 
3: ${\alpha}_{i}^{1}=rand\left(m,n\right)$ 
4: END FOR 
5: $P=\left\{{\alpha}_{i}^{iter}:1\le i\le popsize\right\}$ 
6: iter = 1; Evaluate the corresponding fitness function F_{i} 
7: /* Find the best value of repeatedly until the maximum iterations are reached. */ 
8: WHILE .($iter\le ite{r}_{max}$) DO 
9: /* Find the best fitness value for each candidates */ 
10: FOR EACH ${\alpha}_{i}^{iter}\in P$ DO 
11: Build neural network by using ${x}_{s}^{(0)}$ with the ${\alpha}_{i}^{iter}$ value 
12: Calculate ${\widehat{x}}_{p}^{(0)}\text{}=\text{}\left({\widehat{x}}_{p+1}^{(0)},{\widehat{x}}_{p+2}^{(0)},\dots ,{\widehat{x}}_{p+3}^{(0)}\right)$ by neural network 
13: /* Choose the best fitness value of the ith candidate in history */ 
14: IF (pBest_{i} > fitness(${\alpha}_{i}^{iter}$)) THEN 
15: pBest_{i} = fitness(${\alpha}_{i}^{iter}$) 
16: END IF 
17: END FOR 
18: /* Choose the candidate with the best fitness value of all the candidates */ 
19: FOR EACH ${\alpha}_{i}^{iter}\in P$ DO 
20: IF (gBest > pBest_{i}) THEN 
21: gBest = pBest_{i} = ${x}_{t+1}^{k}={x}^{gbest}\pm :t=1,2,\cdots ,T$ 
22: ${\alpha}_{best}$ = ${\alpha}_{i}^{iter}$ 
23: END IF 
24: END FOR 
25: /*Update the values of all the candidates by using ACO’s evolution equations.*/ 
26: FOR EACH ${\alpha}_{i}^{iter}\in P$ DO 
27: ${\alpha}_{t+1}\text{}=\text{}0.1\text{}\times \text{}{\alpha}_{t}$ 
28: ${\overline{x}}^{gbest}\text{}=\text{}{x}^{gbest}+({x}^{gbest}\times 0.01)\to \{\begin{array}{lll}if& f({\overline{x}}^{gbest})f({x}^{gbest})\le \text{}\to the\text{}sign& is(+)\\ if& f({\overline{x}}^{gbest})f({x}^{gbest})\text{}\le \text{}\to the\text{}sign& is()\end{array}$ 
29: END FOR 
30: $P=\left\{{\alpha}_{i}^{iter}:1\text{}\le \text{}i\text{}\le \text{}popsize\right\}$ 
31: iter = iter + 1 
32: END WHILE 
2.2. Wind Energy Potential Assessment Case Study
In this paper, wind speed data from 2009 to 2013 are adopted to assess the wind energy in four locations—[125, 40], [122.5, 40], [125, 42.5], and [120, 40]—where the first component represents the longitude and the second one denotes the location latitude. The collected wind speed data will be applied from two aspects, 1. Single year data application: Wind speed data in the single year will be analyzed to obtain the yearly assessment results and 2. Whole fiveyear data application: Wind speed data in each season of the five years will be analyzed to obtain the seasonal assessment results as well as the whole fiveyear assessment results.
In addition, beyond the CSWeibull and ACWeibull models, an original Weibull model and two other models related to the Firefly Algorithm (FA) and the Genetic Algorithm (GA) are introduced to compare the assessment effectiveness. The two models are abbreviated as the FAWeibull and GAWeibull models, respectively.
2.2.1. Assessment Results in a Single Year
The wind energy assessment is an important indicator to determine the potential of wind resources and describe the amount of wind energy at various wind speed values in a particular location. In a study of the wind energy assessment, the common parameter estimation methods include the method of moments estimate, maximum likelihood estimate, and least squares estimate, which have some disadvantages and limitations. For example, the method of moments estimate is simple where only knowing the moment of the population is sufficient and does not require knowledge of the population distribution. However, it can only be used in the distribution when the population origin moment exists, and the moment only has some of the information. This method only has good performance when the sample size is large. The maximum likelihood estimation (MLE) is a method of estimating the parameters of a statistical model according to observations by finding the parameter values that maximize the likelihood of making the observations given the parameters. However, the maximum likelihood estimation must incorporate the sample distribution. It is more complicated to incorporate the likelihood equations, which often obtains the approximate solution by computer iterative computation. The maximum likelihood estimation is complex and may lead to multioptimal solutions or nonoptimal solutions. The least squares can be applied to estimate linear and nonlinear relationships. When applying the least square to estimate the parameters of models, the observed data do not require information about the probability and statistics method. However, the least square has two kinds of defects. If the noise of model is colored noise, the estimation result of the least square is a biased estimation; with increasing data size, “data saturation” will appear. The Bayesian parameter estimation must know the distribution of the random error. When the sample size is small, prior probability has a significant influence on the estimation result (the result of maximum likelihood estimation, method of moments estimate, least square estimate and Bayesian parameter estimation in
Appendix A). In summary, in this paper, the effectiveness of four optimization algorithms (Firefly Algorithm, Genetic Algorithm, Ant Colony Algorithm and Cuckoo Search Algorithm) is evaluated to determine the shape (
k) and scale (
c) parameters of the Weibull distribution function for calculating the wind power density. By comparing the assessment results, the swarm intelligent algorithm showed an effective assessment performance.
The parameter estimation results in a single year, from 2009 to 2013, of the five models are listed in
Table 1. According to the estimated parameters given in
Table 1, the five models can be determined, and
Figure 1 is the indication of the PDF fitting results in a single year from 2009 to 2013.
With the PDF fitting results, in this paper, the following four error evaluation criteria (showed in Equations (5)–(7)) are adopted to evaluate the assessment performance:
where
${y}_{i}$ is the observed value,
${\widehat{y}}_{i}$ means the forecasted value, and
$\overline{y}$ is calculated by
$\overline{y}\text{}=\text{}{\displaystyle \sum _{i=1}^{n}{y}_{i}}/n$.
Table 2 provides the assessment performance evaluation results in a single year from 2009 to 2013 of the four optimization algorithms on a yearly basis in terms of MAE, RMSE, SSE and
R^{2}, respectively. As seen from
Table 2, although the presented descriptive statistics provide meaningful statistical analysis, especially regarding the distribution of the wind speed, they cannot be solely used to judge the precision level of each optimization algorithm for estimating the parameters of Weibull distribution. Therefore, the different evaluation criteria introduced by Equations (5)–(8) are employed to appraise the performances of the four selected parameter estimation optimization algorithms. It is meaningful that different statistical criterion supplies different useful views for comparing the optimization algorithms. As a result, the combination of all statistical indicators provides an effective way to compare the different parameter estimation optimization algorithms for wind power assessment. The effectivity of the assessed wind power density values changes when the parameter estimation optimization algorithms change. This is apparent for each research site when the four optimization algorithms of CS, GA, FA and AC are utilized to estimate the parameters of Weibull distribution. This conclusion is drawn from the low error values and high
R^{2} and SSE values. On the other hand, the lowest agreement levels are attained when the four algorithms are applied for
k and
c parameter calculations. According to the statistical results in
Table 2, for the four sites Chinese wind farm sites, the best results for calculating the wind speed density are achieved when the four optimization algorithms are employed to compute the
k and
c parameters. For each gate station site, the most precise results are obtained using the different optimization algorithms [
20].
2.2.2. Seasonal and Whole FiveYear Assessment Results
Considering that wind speed data may be vastly different in different years, this section provides seasonal and whole fiveyear wind energy assessment results by comprehensively using the wind speed data in the five years from 2009 to 2013. Similarly,
Table 3 lists the seasonal and whole fiveyear parameter estimation results, and
Figure 2 and
Table 4 present the PDF fitting and corresponding error results.
The same conclusion can be obtained from these results; i.e., the four new proposed models based on the FA, GA, CS algorithm and AC algorithm are superior to the original Weibull model.
The two parameter Weibull distribution function has been widely applied to different kinds of wind energyrelated investigations due to its briefness, flexibility and effectiveness. In this paper, the performance of four optimization algorithms, including the FA, GA, CS, and AC algorithms, was assessed to optimize the
k and
c parameters of the Weibull probability distribution function when calculating the wind power density at four sites in China. The assessments were conducted on both a seasonal and annual basis to offer a more complete analysis. Both the annual and seasonal results showed that by using different parameter estimation methods through different optimization algorithms for determining the
k and
c parameters of the Weibull distribution, the accuracy of the calculated wind power density values would change. According to the wind energy assessment results from the statistical analysis, the FA, GA, CS, and AC algorithms provided a very desirable performance for each site. Another discovery showed the CS and AC algorithms’ approach in terms of the efficiency. The assessment results show that the more appropriate parameter estimation algorithm was not universal among all examined sites. As a matter of fact, the wind energy properties could be a significant factor in wind energy assessment. Annually and seasonally for Site 1, the CS algorithm was recognized as a more appropriate algorithm, while the FA showed weak performance for wind power assessment. For Site 2, the four optimization algorithms were determined as a more effective Weibull parameter estimation algorithm for optimizing the wind power density in each year and season. For Site 3, the AC showed poor performance for the annual wind power density distribution, and the FA was recognized as a more appropriate method. For Site 4, both the FA and GA perform better for the seasonal wind power density. The suggested parameter estimation methods have excellent performance for representing the distribution of seasonal and annual wind power density as well as determining different statistical properties of the power density [
20].
3. Connection between Energy Assessment and Forecasting
In recent years, the denoising method is widely used to preprocess wind speed time series, such as the Ensemble Empirical Mode Decomposition (EEMD), Singular Spectrum Analysis (SSA), and the Wavelet decomposition (WD). Thus far, there is no effective way to choose which denoising methods should be used to address the original wind speed time series. In this section, the wind energy assessment method with the smallest error values is used to choose the best denosing method to preprocess the wind speed time series.
Figure 3 presents the PDF fitting results obtained by three different denoising methods for the four sites, and
Table 5 shows the parameter estimation and error results of the different denosing wind speed time series. As seen from
Figure 3 and
Table 5, the
R^{2} values from Site 1 to Site 4 in the WD denoising method are all closest to 1. Assessment results obtained by the three denoising models show that the MAE values of the WD denoising method is the smallest. In this paper, the WD denoising method is adopted to preprocess the original wind speed to improve the forecasting accuracy.
4. Proposed Integrated Forecasting Framework and Forecasting Results
In this section, three basic neural network forecasting models are first introduced; then, the integrated forecasting framework proposed in this paper is shown. Finally, the forecasting results obtained by the new proposed forecasting framework are analyzed.
4.1. Basic Neural Network Forecasting Models
Artificial neural networks are usually used to forecast fields as they can approximate nonlinear functions with arbitrary accuracy. Three neural network models are introduced in this paper for the wind speed forecasting application.
4.1.1. Back Propagation Neural Network
The back propagation neural network (BPNN) [
21] is a multilayer feedforward neural network. The two main features that should be considered in BPNN are the feedforward signal and back propagated error. In the feedforward process, the signal is passed layerbylayer from the input layer to the hidden layer and then to the output layer. The state of the neurons only impacts the neurons in the adjacent next layer. If the output in the output layer is not expected, back propagation starts.
Suppose
${X}_{1},{X}_{2},\dots ,{X}_{n}$ are the input values of the BPNN;
${Y}_{1},{Y}_{2},\dots ,{Y}_{m}$ are the corresponding output values; and
${\omega}_{ij}$ and
${\omega}_{jk}$ are the weights, the BPNN can be viewed as a nonlinear function and the input values and output values can be regarded as the independent and dependent variables. The BPNN structure in
Figure 4 is the expression of the function mapping relation from
$n$ independent variables to
$m$ dependent variables.
The network training is the main task of the BPNN. Through the training operation, the BPNN has capacity for associative memory and forecasting. The training process of the BPNN includes the following steps:
Step 1: Network initialization. Based on the practical problem, determine the number of nodes in the input, hidden and output layers. Then, initialize the following values: the connection weights ${\omega}_{ij}$ and ${\omega}_{jk}$, threshold values ${\theta}_{j}$ and ${\theta}_{k}$ in the hidden and output layers, respectively, and the learning rate $\eta $ and the transfer functions.
Step 2: The output calculation of the hidden layer. According to the input vector
$X=\left({X}_{1},{X}_{2},\dots ,{X}_{n}\right)$, the connection weights
${\omega}_{ij}$ between the input and hidden layers and the threshold value
${\theta}_{j}$ in the hidden layer as well as the output of the hidden layer can be calculated by Equation(9):
where
$l$ is the number of nodes in the hidden layer and
$f(\xb7)$ is the transfer function of the hidden layer, which has a variety of expression forms. In this research, the following form is adopted in Equation (10):
Step 3: The output calculation of the output layer. According to the output
${H}_{j}$ of the hidden layer, the connection weights
${\omega}_{jk}$ between the hidden layer and output layer, and the threshold value
${\theta}_{j}$ in the output layer, the forecasting output of the BPNN can be expressed as Equation (11):
where
$g(\xb7)$ is the transfer function from the hidden layer to the output layer, which is defined as Equation (12) in this research:
Step 4: Error calculation. With the predicted output
$Y=\left({Y}_{1},{Y}_{2},\dots ,{Y}_{m}\right)$ and the desired output
$DY=\left(D{Y}_{1},D{Y}_{2},\dots ,D{Y}_{m}\right)$, the forecasting error of the network is computed by Equation (13):
where
$P$ is the number of the input and output pairs.
Step 5: Weights update. Update the connection weights
${\omega}_{ij}$ and
${\omega}_{jk}$ by Equations (14) and (15):
where
$\eta $ is the learning rate, and shows Equations (16) and (17)
Step 6: Threshold update. By using the forecasting error of the network, the threshold is updated by Equations (18) and (19):
Step 7: Termination determination. Determine whether the termination requirement is achieved, if so, ended, otherwise, return to Step 2.
4.1.2. Wavelet Neural Network
The Wavelet Neural Network (WNN) [
22] is a neural network type that is constructed on the basis of the BPNN topology, and the wavelet basis function is regarded as the transfer function of the hidden layer nodes. In this type of network, the signal is transferred feedforward, while the error is transferred backforward. Suppose
${X}_{1},{X}_{2},\dots ,{X}_{n}$ are the inputs of the network,
${Y}_{1},{Y}_{2},\dots ,{Y}_{m}$ are the forecasted output, and
${\omega}_{ij}$ and
${\omega}_{jk}$ are the weights, the output of the hidden layer can be represented by Equation (20)
where
${h}_{j}$ is the output of the
jth hidden layer node,
${\omega}_{ij}$ is the connection weight between the input and hidden layers,
$h(\xb7)$ is the wavelet function,
${b}_{j}$ is the shift factor of the wavelet function, and
${a}_{j}$ is the stretch factor wavelet function.
The forecasted value of the output layer can be calculated by Equation (21):
where
${\omega}_{jk}$ is the weight between the hidden and output layers,
${h}_{j}$ is the output of the
jth hidden layer nodes,
l is the number of the nodes in the hidden layer, and
m is number of the nodes in the output layer.
The process of the WNN algorithm is as follows:
Step 1: Network initialization. Randomly initialize the stretch factor ${a}_{k}$, shift factor ${b}_{k}$, network connection weights ${\omega}_{ij}$ and ${\omega}_{jk}$, and network learning rate $\eta $.
Step 2: Sample classification. Divide the samples into the training and testing samples, which are used to train the network and test the forecasting accuracy of the network, respectively.
Step 3: Output prediction. Input the training sample into the network and calculate the predicted output of the network as well as the error between the network output and desired output.
Step 4: Weight correction. Correct the network weights and parameters in the wavelet function according to the calculated error values, helping the network predicted values approach the expected values.
Step 5: Algorithm termination judgment. Determine whether the algorithm termination is satisfied; if not, return to Step 3.
4.1.3. Elman Neural Network
ENN [
23] is generally divided into four layers, input, hidden, context and output layers. The connections between the input, hidden and output layers are similar to the feedforward network. The nodes in the input layer only play a signal transmission role, while those in the output layer have a linear weighted effect. The transfer function of the hidden layer can be either linear or nonlinear, and the context layer, which is also known as the undertake or state layer, is used to remember the previous output of the hidden layer and return it to the network input so it can be considered a singlestep delay operator.
Through the delay and storage of the context layer, the output of the hidden layer can be selfconnected to the input of the hidden layer. This selfconnection approach makes the network sensitive to the historical data and increases the capacity of the network to address the dynamic information, which can then achieve the dynamic modeling purpose. In addition, the ENN can approximate any nonlinear map with arbitrary precision without considering the specific form of the external noise impact on the system. Therefore, given the input and output pair of the system, the system can be modeled.
4.2. Structure of the Proposed Integrated Forecasting Framework
In this paper, neural network models based on the three artificial intelligent neural networks mentioned in
Section 4.1—i.e., the ENN, BPNN and WNN—are used to forecast the wind speed; the integrated forecasting framework is shown in
Figure 5 and can be decomposed into the following three main procedures. First, the wavelet decomposition (WD) [
24] is used to decompose the original wind speed data. As seen from
Section 3, the WD method is the best preprocessing method selected according to the wind energy assessment results, and it is used to preprocess the original wind speed. With this operation, three new models, abbreviated as WDENN, WDBPNN and WDWNN, are gained. Second, the CS and the AC algorithms are adopted to optimize the unknown weight and bias matrices between hidden and output layers in the three neural network models obtained in the first step, respectively. Additionally, with this implementation, in addition to the three neural networks optimized by the CS algorithm, named the WDCSENN, WDCSBPNN and WDCSWNN, three neural networks optimized by the AC algorithm, abbreviated as the WDACENN, the WDACBPNN and the WDACWNN, are obtained as well (shown in
Figure 4). The related pseudo codes are presented in Algorithms 3 and 4.
Algorithm 3: Three Neural Networks Optimized by the CS Algorithm 
Input:${x}_{s}^{\left(0\right)}\text{}=\text{}\left({x}^{\left(0\right)}\left(1\right),{x}^{\left(0\right)}\left(2\right),\dots ,{x}^{\left(0\right)}\left(q\right)\right)$—a sequence of training data. ${x}_{p}^{\left(0\right)}\text{}=\text{}\left({x}^{\left(0\right)}\left(q\text{}+\text{}1\right),{x}^{\left(0\right)}\left(q\text{}+\text{}2\right),\dots ,{x}^{\left(0\right)}\left(q\text{}+\text{}d\right)\right)$—a sequence of verifying data

Output:x_{b}—the value of x with the best fitness value in population of nests Fitness Function: $x(k)\text{}=\text{}f({\omega}_{1}{x}_{c}(k)\text{}+\text{}{\omega}_{2}(u(k1)))$ (ENN) $f(net)=\frac{1}{1+{e}^{net}}$ (BPNN) $h(j)={h}_{j}\left(\frac{{\displaystyle \sum _{i=1}^{k}{\omega}_{ij}x{b}_{j}}}{{a}_{j}}\right)$ (WNN)

Parameters: 
Num Cuckoos = 50  number of initial population 
Min Number Of Eggs = 2;  minimum number of eggs for each cuckoo 
Max Number Of Eggs = 4;  maximum number of eggs for each cuckoo 
Max Iter = 200;  maximum iterations of the Cuckoo Algorithm 
Knn Cluster Num = 1;  number of clusters that we want to make 
Motion Coeff = 20;  Lambda variable in COA paper, default = 2 
accuracy = 0 × 10^{−10};  How much accuracy in answer is needed 
Max Num Of Cuckoos = 20;  maximum number of cuckoos that can live at the same time 
Radius Coeff = 0.05;  Control parameter of egg laying 
Cuckoo Pop Variance = 1 × 10^{−10};  population variance that cuts the optimization 
1: /* Initialize population of n host nests x_{i} (i = 1, 2, ..., n) randomly*/ 
2: FOR EACH i: 1 ≤ i ≤ n DO 
3: Evaluate the corresponding fitness function
F_{i} 
4: END FOR 
5: WHILE (g< Gen_{Max}) DO 
6: /* Get new nests by
Lévy flights */ 
7: FOR EACH i: 1 ≤ i ≤ n DO 
8: x_{L}=x_{i}+α⊕Levy(λ); 
9: END FOR 
10: FOR EACH i: 1 ≤ i ≤ n DO 
11: Compute F_{L} 
12: IF (F_{L} < F_{i}) THEN 
13: x_{i}←x_{L}; 
14: END IF 
15: END FOR 
16: Compute F_{L} 
17: /*Update best nest x_{p} of the d generation*/ 
18: IF (F_{p} < F_{b}) THEN 
19: x_{b}←x_{p}; 
20: END IF 
21: END WHILE 
22: RETURN x_{b} 
Algorithm 4: Three Neural Networks Optimized by the AC Optimization Algorithm 
Input: ${x}_{s}^{\left(0\right)}=\left({x}^{\left(0\right)}\left(1\right),{x}^{\left(0\right)}\left(2\right),\dots ,{x}^{\left(0\right)}\left(q\right)\right)$—a sequence of training data. ${x}_{p}^{\left(0\right)}=\left({x}^{\left(0\right)}\left(q+1\right),{x}^{\left(0\right)}\left(q+2\right),\dots ,{x}^{\left(0\right)}\left(q+d\right)\right)$—a sequence of verifying data

Output:x_{b}—the value of x with the best fitness value in population of nests Fitness Function: $x(k)=f({\omega}_{1}{x}_{c}(k)+{\omega}_{2}(u(k1)))$ (ENN) $f(net)\text{}=\text{}\frac{1}{1\text{}+\text{}{e}^{net}}$ (BPNN) $h(j)={h}_{j}\left(\frac{{\displaystyle \sum _{i=1}^{k}{\omega}_{ij}x{b}_{j}}}{{a}_{j}}\right)$ (WNN)

Parameters: Maximum iterations:50 The number of ant:30 Parameters of the important degree of information elements:1 Parameters of the important degree of the Heuristic factor:5 Parameters of the important degree of the heuristic factor:0.1 Pheromone increasing intensity coefficient:100 NC_max—Maximum iterations:50 m—The number of ant:30 Alpha—Parameters of the important degree of information elements:1 Beta—Parameters of the important degree of the Heuristic factor:5 Rho—Parameters of the important degree of the heuristic factor:0.1 Q—Pheromone increasing intensity coefficient:100

1: /*Initialize popsize candidates with the values between 0 and 1*/ 
2: FOR EACH i $1\le i\le n$ DO 
3: ${\alpha}_{i}^{1}=rand\left(m,n\right)$ 
4: END FOR 
5: $P=\left\{{\alpha}_{i}^{iter}:1\le i\le popsize\right\}$ 
6: iter = 1; Evaluate the corresponding fitness function F_{i} 
7: /* Find the best value of repeatedly until the maximum iterations are reached. */ 
8: WHILE .($iter\le ite{r}_{max}$) DO 
9: /* Find the best fitness value for each candidates */ 
10: FOR EACH ${\alpha}_{i}^{iter}\in P$ DO 
11: Build neural network by using ${x}_{s}^{(0)}$ with the ${\alpha}_{i}^{iter}$ value 
12: Calculate ${\widehat{x}}_{p}^{(0)}=\left({\widehat{x}}_{p+1}^{(0)},{\widehat{x}}_{p+2}^{(0)},\dots ,{\widehat{x}}_{p+3}^{(0)}\right)$ by neural network 
13: /*Choose the best fitness value of the
i^{th} candidate in history */ 
14: IF (pBest_{i} > fitness(${\alpha}_{i}^{iter}$)) THEN 
15: pBest_{i} = fitness(${\alpha}_{i}^{iter}$) 
16: END IF 
17: END FOR 
18: /* Choose the candidate with the best fitness value of all the candidates */ 
19: FOR EACH ${\alpha}_{i}^{iter}\in P$ DO 
20: IF (gBest > pBest_{i}) THEN 
21: gBest = pBest_{i} = ${x}_{t+1}^{k}={x}^{gbest}\pm :t=1,2,\cdots ,T$ 
22: ${\alpha}_{best}$ = ${\alpha}_{i}^{iter}$ 
23: END IF 
24: END FOR 
25: /*Update the values of all the candidates by using ACO’s evolution equations.*/ 
26: FOR EACH ${\alpha}_{i}^{iter}\in P$ DO 
27: ${\alpha}_{t+1}\text{}=\text{}0.1\text{}\times \text{}{\alpha}_{t}$ 
28: ${\overline{x}}^{gbest}\text{}=\text{}{x}^{gbest}\text{}+\text{}({x}^{gbest}\text{}\times \text{}0.01)$$\to \{\begin{array}{l}\begin{array}{cc}if& \begin{array}{cc}f({\overline{x}}^{gbest})\text{}\text{}f({x}^{gbest})\text{}\le \text{}\to the\begin{array}{cc}sign& is(+)\end{array}& \end{array}\end{array}\\ \begin{array}{c}\begin{array}{cc}if& \begin{array}{cc}f({\overline{x}}^{gbest})\text{}\text{}f({x}^{gbest})\text{}\le \text{}\to the\begin{array}{cc}sign& is()\end{array}& \end{array}\end{array}\end{array}\end{array}$ 
29: END FOR 
30: $P\text{}=\text{}\left\{{\alpha}_{i}^{iter}:1\text{}\le \text{}i\text{}\le \text{}popsize\right\}$ 
31: iter = iter + 1 
32: END WHILE 
4.3. Wind Speed Forecasting Case Study
When the original wind speed time series is disposed by the WD method, the preprocessed wind speed time series is considered as the input of the optimized BPNN, ENN and WNN models. It is worth noting that the method for dividing the original wind speed time series into the training and testing sets is quite important. Moreover, in the network training procedure, the training inputs are denoised data, while the training output is the original training time series. In the testing step, the inputs are also the denoised wind speed data, and the output is the original testing output. However, the testing output is assumed to be unknown.
Figure 6 presents the data division results; in this paper, the training dataset window with length
N = 1008 is fixed according to the original time series. For example, suppose a study of the wind speed time series will be forecasted. Apart from the data division, the forecasting horizon is also an important index. In this paper, multistep ahead forecasting with values
h = 1, 2, and 3 are analyzed, where
h is a prediction step.
Related parameter initialization values in different neural networks are shown in
Table 6. Based on the error evaluation criteria, MAE, defined in Equation (5) and the following two forecasting error evaluation criteria shows in Equations (22) and (23), forecasting error values obtained by different neural networks are listed in
Table 7.
where
${y}_{i}$ and
${\widehat{y}}_{i}$ are the actual and forecasted wind speed values, and
$n$ is the number of the data samples.
Table 7 provides the forecasting error results with three different horizons, onestepahead, twostepsahead and threestepsahead. As seen, under the same horizon conditions, performances of the optimized nine neural networks are all better than those of the three single neural networks. Additionally, models optimized by the WD and CS or WD and AC are all superior to those that were only optimized by the WD algorithm. While the models optimized by the WD and CS are compared with the models optimized by the WD and AC, for the onestepahead horizon forecasting results shows in
Figure 7, error values obtained by the WD and CS algorithms are all smaller than the corresponding models optimized by the WD and AC algorithms. For the twostepahead horizon forecasting results shows in
Figure 8, the BPNN model optimized by the WD and CS is worse than that optimized by the WD and AC algorithms. For the threestepsahead horizon forecasting results shows in
Figure 9, the ENN and BPNN models optimized by the WD and CS are both worse than the one optimized by the WD and AC algorithms. In conclusion, the novel optimized models proposed in this paper are all better than the original models.
5. Conclusions
Effective wind energy potential assessment and forecasting for a particular site plays an indispensable role in the design, evaluation and scheduling of wind farms. In this paper, based on the CS and AC algorithms, two new wind energy assessment models, as well as six wind speed forecasting models, are proposed. First, the CS and AC algorithms are introduced to estimate the two unknown parameters in the Weibull distribution as well as improve the assessment accuracy. The four assessment error evaluation criteria sets of results demonstrate that the two newly proposed assessment models are effective and meaningful. Then, the best data preprocessing approach is selected according to the wind energy potential evaluation results and is adopted to process the wind speed time series. Finally, the CS and AC algorithms are used to optimize three neural networks—namely the ENN, BPNN and WNN—and the three sets of forecasting error evaluation criteria results demonstrate that the six newly proposed assessment models perform better than the original ones. Therefore, forecasting researchers can greatly benefit from data preprocessing and swarm intelligent optimization techniques and these data allow for significant improvements in accuracy.