A Novel Multi-objective Optimal Approach for Wind Power Interval Prediction

Numerous studies on wind power forecasting show that random errors found in the prediction results cause uncertainty in wind power prediction and cannot be solved effectively using conventional point prediction methods. In contrast, interval prediction is gaining increasing attention as an effective approach as it can describe the uncertainty of wind power. A wind power interval forecasting approach is proposed in this article. First, the original wind power series is decomposed into a series of subseries using variational mode decomposition (VMD); second, the prediction model is established through kernel extreme learning machine (KELM). Three indices are taken into account in a novel objective function, and the improved artificial bee colony algorithm (IABC) is used to search for the best wind power intervals. Finally, when compared with other competitive methods, the simulation results show that the proposed approach has much better performance.


Introduction
Wind power is rapidly growing due to its clean, renewable characteristics and mature technology.To solve the problems of serious environmental pollution and fossil energy depletion, clean energy will gradually replace fossil energy, and wind power will become increasingly important in the grid.However, its intermittency and uncertainty has brought great challenges to the operation of the grid.Therefore, high-quality wind power prediction is an important means of ensuring the safe and stable operation of the power system.
Much research on wind power prediction methods has been conducted and some of the mainstream wind power forecasting methods and systems along with their practical applications are summarized in Reference [1].Broadly speaking, the approaches for wind speed and power forecasting can be divided into three categories: physical forecasting approaches, statistical forecasting approaches, and combination approaches [2].
Physical approaches basically deal with the transformation of numerical weather prediction (NWP) data into wind-electric power.They perform well over a longer horizon, but are limited by the accuracy of the NWP data.Statistical approaches generally use historical data to build statistical models that show good performance for short-term forecasts.The combination approach combines different approaches and retains the advantages of each one; however, this is not always better than the best individual forecasts.
Numerous approaches have been proposed for forecasting wind speed and wind power.However, most existing approaches focus on point forecasting, where the prediction errors that exist cause uncertainties to the forecast results.Furthermore, Reference [3] made a comparison of a series of well-known wind power forecasting methods where the accuracy of those methods was found to be less than reported.In addition to point prediction, the uncertainty estimation can provide fluctuation intervals of wind power, which will give a more comprehensive reference to grid dispatchers and operators.
The uncertainty prediction of wind power can be divided into three categories, i.e., probabilistic forecasts, risk indices, and scenarios of generation.In recent years, probabilistic forecasting has received much attention; however, the conventional methods of probabilistic forecast often require special prior assumptions of error distribution [4].It was found that the probability density function of forecast errors varied greatly in Reference [5], which means that it is not reasonable to assume a specific distribution of the forecast errors.Quantile regression [6], the kernel density forecast method [7], and the Gaussian process [8,9] were used to present probabilistic wind power prediction intervals (PIs) without any assumption; however, the construction methods of PIs require complex mathematical calculations or rely on results from point forecasting.An upper and lower bound estimation (LUBE) approach that uses the prediction interval coverage probability (PICP) and the prediction interval normalized average width (PINAW) to evaluate the PIs was proposed in Reference [10].However, it was proved in Reference [11] that the coverage width-based criterion (CWC) score used in Reference [10] did not have suitable properties in which to evaluate the PIs.Furthermore, the CWC score could not measure the PIs in a comprehensive way as the deviation of the data not covered by the PI was not considered.The interval deviation index was taken into account in the interval score as an additive term to the interval width index in References [12,13], but it was difficult to decide which one was more important given that the interval deviation has a tendency to decrease as the interval width increases.Therefore, three indexes-PICP, PINAW and normalized average deviation (NAD)-were used to evaluate the PIs in this study.In addition, a novel multi-objective function that could satisfy all the criteria was used to construct the optimal PIs.Moreover, the inherent characteristics of wind power sequences were not considered in the previous methods.To resolve this problem, the wavelet transform [14] and ensemble empirical mode decomposition (EEMD) methods [4] were employed to decompose the original wind power time series into several components, and proved to be superior to the method without decomposition.Variational mode decomposition (VMD) proved to be excellent compared to the EEMD [15] and was adopted in this study.
Neural network was used to obtain PIs in Reference [16], and the extreme learning machine (ELM) employed in Reference [17] was proven to be superior to neural network (NN)-based methods.However, ELM also has some drawbacks, which include difficulties in determining the number of hidden layer nodes, and the lack of consideration of structural risk, which leads to over-fitting.The kernel extreme learning machine (KELM) which is described in Reference [18] was adopted to improve the generalization and stability of the forecasting model, and an improved artificial bee colony algorithm (IABC) was used to optimize the penalty coefficient C and kernel parameter σ of KELM.
In summary, the main contributions of this article are: (1) applying KELM (optimized by IABC) to obtain PIs as KELM has good generalization and stability; and (2) introducing a novel multi-objective function, which provides a powerful authority to the decision-maker for satisfying all the indices.
The rest of this paper is organized as follows.The overall structure and theoretical knowledge of the approach proposed are described in Section 2. The method of constructing optimal PIs is depicted in Section 3, and the simulation results of the proposed approach compared with other methods are presented in Section 4. Finally, conclusions are drawn in Section 5.

Proposed Approach for Forecasting Wind Power Intervals
The structure of the proposed approach had two stages (Figure 1).In the first stage, the wind power sequence was decomposed into several subseries using VMD, and the subseries were then grouped into reconstructed components using sample entropy (SE) theory.In the second stage, the optimal wind power PIs were constructed using KELM and IABC.A novel multi-objective function to construct the optimal PIs was also proposed.A description of the hybrid approach is given below.optimal wind power PIs were constructed using KELM and IABC.A novel multi-objective function to construct the optimal PIs was also proposed.A description of the hybrid approach is given below.
Figure 1.Structure of the approach for wind power interval prediction.

Variational Mode Decomposition
Variational mode decomposition is an adaptive signal decomposition method proposed by Dragomiretskiy and Zosso [19].It is used to decompose the original signal into a series of bandlimited modes uk, where the center of each mode is pulsation wk.
The bandwidth of each mode is obtained through the following steps: First, the associated analytic signal of each mode is computed using Hilbert transform to obtain a unilateral frequency spectrum; second, each mode's frequency spectrum is shifted to baseband by mixing with an exponential tuned to the estimated center frequency; finally, the bandwidth is obtained through the H 1 Gaussian smoothness of the demodulated signal.The resulting constrained variational problem is shown as follows: where, δ is the Dirac distribution, k represents the number of modes, * denotes convolution, { } Making use of a quadratic penalty term and Lagrange multipliers, the constrained variational problem can be addressed as follows: The detailed process is described in Reference [19], and the pseudocode of the VMD method is shown below (Algorithm 1).

Variational Mode Decomposition
Variational mode decomposition is an adaptive signal decomposition method proposed by Dragomiretskiy and Zosso [19].It is used to decompose the original signal into a series of band-limited modes u k , where the center of each mode is pulsation w k .
The bandwidth of each mode is obtained through the following steps: First, the associated analytic signal of each mode is computed using Hilbert transform to obtain a unilateral frequency spectrum; second, each mode's frequency spectrum is shifted to baseband by mixing with an exponential tuned to the estimated center frequency; finally, the bandwidth is obtained through the H 1 Gaussian smoothness of the demodulated signal.The resulting constrained variational problem is shown as follows: where, δ is the Dirac distribution, k represents the number of modes, * denotes convolution, {u k } and {ω k } represent the set of all modes {u 1 , • • • , u k } and their center frequencies {ω 1 , • • • , ω k }, respectively.Making use of a quadratic penalty term and Lagrange multipliers, the constrained variational problem can be addressed as follows: The detailed process is described in Reference [19], and the pseudocode of the VMD method is shown below (Algorithm 1).

Sample Entropy
Sample entropy (SE) was used to calculate the complexity of the subseries.The more complex the series, the greater its SE value.The SE value of one series is denoted by SE(N, m, r), which is defined as the negative logarithm of the conditional probability that two sequences similar for m points remain similar at the next point, excluding self-matches.N is the number of data points, m denotes the embedding dimension, and r denotes the tolerance.The description of the SE technique can be found in Reference [20].
The SE value is related with m and r, but its increase or decrease trend is not affected by m and r for it has a good consistency.Generally, the value of m is 2, and r equals 0.1-0.25 SD in which SD is the standard deviation of the time series.

Kernel Extreme Learning Machine
To improve the generalization and stability of ELM, the kernel function was introduced to propose the KELM algorithm.The output function of KELM can be written compactly as follows [18]: where I is a diagonal matrix, C is the penalty coefficient, T is the matrix of targets, H is the hidden-layer output matrix and kernel matrix Ω ELM is defined as follows: The hidden layer feature mapping h(x) does not need to be known to users, and the dimensionality L of the feature space (number of hidden nodes) also does not need to be given.However, the performance of KELM is sensitive to the penalty coefficient C and kernel parameter σ; therefore, the Bat-inspired algorithm was used to optimize these two parameters.

Artificial Bee Colony Algorithm and Its Modification
The artificial bee colony algorithm is an optimization algorithm that was proposed by Karaboga in 2005 and a detailed description of artificial bee colony algorithm (ABC) can be found in Reference [21].In this paper, the position of each food source represented a solution which contains the values of the penalty coefficient C and the kernel parameter σ for the KELM.
In each iteration, an onlooker bee chooses a food source depending on the probability value defined in Equation ( 7): where f i is the fitness of solution x i , and S is the number of food sources equal to the number of employed bees or onlooker bees.
The employed bees search for the food source according to Equation ( 8): where φ id is a random number After the evaluation of each candidate source, if the fitness is not improved, then the associated food source is abandoned and the employed bee becomes a scout.The scout discovers a new food source according to Equation ( 9): To decrease the possibility of the local optimum, an improved ABC-based on opposition-based learning-was proposed in Reference [22].The modification strategy can be described as follows: For each food source, a random value λ was generated and compared with an adaptive variable PM which was defined as PM = 0.01 + 0.1 × (2 − e g ln(2)/G ), where g and G represent the current number and the maximum number of iterations.If λ < PM, then opposition-based learning was carried out.Opposition-based learning is depicted below:

Reliability and Sharpness of PIs
Reliability and sharpness are required and desirable properties for evaluating the probabilistic forecast [23].The PIs are composed of an upper and lower boundary with a certain probability level.PI coverage probability and PI normalized average width [24] are commonly used to describe the reliability and sharpness of the constructed PIs and are defined as follows: 1 Prediction interval coverage probability (PICP) Energies 2017, 10, 419 6 of 15 where N denotes the number of prediction points, c i = 1 when the target value t i ∈ [U i , L i ] in which L i and U i are the lower and upper bound of each PI, otherwise c i = 0. 2 Prediction interval normalized average width (PINAW) where R is the range of the target values and is used to normalize the average width of PIs.
PICP is used to describe the reliability of the constructed PIs, and PINAW is used to characterize the sharpness of the PIs.However, these two indices are not enough to evaluate the PIs.For example, when the I PICP and I PINAW of two PIs are the same, the optimal PI cannot be judged if the real data are not covered by the PIs and the deviation from the upper bound (or lower bound) is different.Therefore, the normalized average deviation was introduced in this study.3 Normalized average deviation (NAD) The NAD is used to express the deviation of the data which are not covered by the PI.
where the expression of ε i is as follows:

The Objective Function of PIs
The optimal PIs with high reliability (i.e., the coverage probability of PIs will be as close to the prescribed nominal level of confidence as possible), narrow average width, and small accumulated deviation are required, which can be expressed as a multi-objective optimization problem.However, these three indices conflict with each other, for example, when the PICP increases, the PINAW increases correspondingly; when the NAD decreases, the PICP usually tends to increase.
It is not possible to simply translate a multi-objective optimization problem into a single-objective solution as the multiple objects are incommensurable (without uniform metrics between targets, it is difficult to directly compare the targets) and conflict with each other.Therefore, a Pareto-based method was proposed in this paper to construct a multi-objective model, which could overcome the random selection of target weights and the difficulty in dealing with objects with different dimensions.The concrete method of constructing the multi-objective model is as follows.
For the convenience of solution, the problem of making I PICP close to the corresponding PI nominal confidence (PINC) is transformed into the minimization problem of coverage probability error I CPE through Equation ( 15): Generally, a smaller I CPE indicates more reliable PIs.As the PIs with a low confidence level are meaningless, a 90% confidence level was required in this study.Therefore, the multi-objective optimization model was established as follows: where ω is the output weight of the KELM and is optimized by IABC.The proposed multi-objective optimization method was based on the Pareto theory, which is depicted below [25,26].
The Pareto optimum solution set is a solution set that provides a set of alternatives for decision-makers for flexible choice.In the Pareto optimum solution set, there is no solution that is definitely better than another, and it can be explained by dominant and non-dominant concepts.For example, X 1 and X 2 are assumed as the resultant solutions, and the objective functions are f 1 , f 2 , and f 3 .If f 1 (X 1 ) < f 1 (X 2 ) and f 2 (X 1 ) < f 2 (X 2 ) and f 3 (X 1 ) < f 3 (X 2 ), then X 1 dominates X 2 ; if f 1 (X 1 ) > f 1 (X 2 ) and f 2 (X 1 ) > f 2 (X 2 ) and f 3 (X 1 ) > f 3 (X 2 ), then X 2 dominates X 1 .Otherwise, X 1 and X 2 do not dominate each other, and they are called non-dominant solutions, which create the Pareto optimum solution set.Thus, selecting a solution from the Pareto optimum solution set provides great flexibility.The relative optimal solution was chosen through the technique for order preference by similarity to an ideal solution (TOPSIS) [27].

Prediction Process
The flowchart of the proposed approach for constructing PIs using KELM optimized by IABC is shown in Figure 2, and the detailed steps are as follows.
meaningless, a 90% confidence level was required in this study.Therefore, the multi-objective optimization model was established as follows: ( ) 0 where ω is the output weight of the KELM and is optimized by IABC.The proposed multi-objective optimization method was based on the Pareto theory, which is depicted below [25,26].
The Pareto optimum solution set is a solution set that provides a set of alternatives for decisionmakers for flexible choice.In the Pareto optimum solution set, there is no solution that is definitely better than another, and it can be explained by dominant and non-dominant concepts.For example, X1 and X2 are assumed as the resultant solutions, and the objective functions are f1, f2, and f3.If f1(X1) < f1(X2) and f2(X1) < f2(X2) and f3(X1) < f3(X2), then X1 dominates X2; if f1(X1) > f1(X2) and f2(X1) > f2(X2) and f3(X1) > f3(X2), then X2 dominates X1.Otherwise, X1 and X2 do not dominate each other, and they are called non-dominant solutions, which create the Pareto optimum solution set.Thus, selecting a solution from the Pareto optimum solution set provides great flexibility.The relative optimal solution was chosen through the technique for order preference by similarity to an ideal solution (TOPSIS) [27].

Prediction Process
The flowchart of the proposed approach for constructing PIs using KELM optimized by IABC is shown in Figure 2, and the detailed steps are as follows.Step 1: Input the training data, validation data, testing data, and the initial parameters of the KELM and IABC.The output data of the training set is processed as per Equation (17) to form the initial output intervals of the training set: where y i is the original output, Y i is the processed output interval, and R is a random number of the interval [0, 1].
Step 2: Produce the initial population.Apply the KELM with initial parameters to the training set to obtain the weighting factor ω, which is then applied to the validation data.
Step 3: Compare the real data and the PIs obtained from the validation set, and calculate the indices based on Equations ( 11)-( 15) to form the objective functions according to Section 3.2.
Step 4: Put the solutions of all food sources into the Pareto optimal solution set and delete the dominant solutions.
Step 5: In each iteration, make use of the IABC to optimize the parameters of the KELM, calculate the objective functions using the PIs obtained from the validation set, and update the Pareto solution set.Thus, the optimal KELM models can be obtained.
Step 6: Apply the optimal models to the testing set, choose the relative optimal solution using TOPSIS, and output the results of wind power PIs.

Numerical Results and Discussions
To evaluate the performance of the approach proposed in this paper, the approaches using KELM and ELM were compared to prove the superiority of KELM.Next, the approach using the interval score [13] (defined in Equation ( 18)) was compared with the Pareto-based approach to verify the superiority of the multi-objective function proposed in this paper.In addition, the approach where only two indices (I PICP and I PINAW ) were used and the approach using quantile regression were compared with the proposed approach.The detailed simulation results are shown in Section 4.2.

S t (x
where ξ represents the width of the PI, α = 1 − PINC.

Dataset and Parameter Settings
In the studies, the data used for testing was from two wind farms (i.e., wind farm M and wind farm J) located in northern China and southern China, respectively.The capacity of wind farm M is 50 MW, and the data was recorded in 15 min interval.The capacity of wind farm J is 95 MW, and the data was recorded in 5 min interval.Taking the data of one month as an example, the data were divided into three groups of training samples, validation samples and testing samples at a proportion of 60%, 20% and 20%, respectively.
Regarding the VMD algorithm, the number k of modes should be assigned in advance.It is shown that too few modes result in the incomplete decomposition of the signal, and too many modes may lead to over-decomposition or extra noise.The optimal mode number can be selected using the difference between the center frequencies of the subseries S(k) and S(k − 1), as the center frequency is closely related to the decomposition results of VMD [28].By decomposing the historical wind power series, it was found that the value of ∆f tended to be stable when the value of k was from one to nine.Furthermore, the simulation results showed that number of subseries in the EEMD process was nine.Thus, the number of modes was set as nine.The other parameters in the VMD process were set as per Reference [29].The penalty factor α and the tolerance of the convergence criterion ε were set to default values of 2000 and 10 −6 , respectively.The size of the IABC population was set as 30 as per Reference [22], and the iteration number was set as 100 based on the fact that the IABC convergences after 100 iterations based on many experiments.

Numerical Results and Analysis
The original wind power time series of one month was decomposed using VMD.Next, the SE of every subseries was calculated and is shown in Figure 3.The solid line in Figure 3 represents the sample entropy of the original series.
Energies 2017, 10, 419 9 of 15 The size of the IABC population was set as 30 as per Reference [22], and the iteration number was set as 100 based on the fact that the IABC convergences after 100 iterations based on many experiments.

Numerical Results and Analysis
The original wind power time series of one month was decomposed using VMD.Next, the SE of every subseries was calculated and is shown in Figure 3.The solid line in Figure 3 represents the sample entropy of the original series.Next, the subseries were reconstructed according to the following rules: the subseries with SE values smaller than the original series were added (i.e., subseries 1 and 2 in Figure 3), the subseries with SE values bigger than the original series and close to each other with a distance smaller than 0.05 were combined (i.e., the third and ninth subseries were combined, the fourth to sixth subseries were combined, and the seventh and eighth subseries were combined).After processing, the reconstructed subseries are shown in Figure 4.For clarity, only a portion of the data (the first 300 points) are shown.It can be seen in Figure 4 that the reconstructed subseries have obvious regularity, where the frequency increases from top to bottom, and each subseries has periodicity to a certain extent.For each subseries, the method described in Section 3.3 was used for interval prediction and the Next, the subseries were reconstructed according to the following rules: the subseries with SE values smaller than the original series were added (i.e., subseries 1 and 2 in Figure 3), the subseries with SE values bigger than the original series and close to each other with a distance smaller than 0.05 were combined (i.e., the third and ninth subseries were combined, the fourth to sixth subseries were combined, and the seventh and eighth subseries were combined).After processing, the reconstructed subseries are shown in Figure 4.For clarity, only a portion of the data (the first 300 points) are shown.
Energies 2017, 10, 419 9 of 15 The size of the IABC population was set as 30 as per Reference [22], and the iteration number was set as 100 based on the fact that the IABC convergences after 100 iterations based on many experiments.

Numerical Results and Analysis
The original wind power time series of one month was decomposed using VMD.Next, the SE of every subseries was calculated and is shown in Figure 3.The solid line in Figure 3 represents the sample entropy of the original series.Next, the subseries were reconstructed according to the following rules: the subseries with SE values smaller than the original series were added (i.e., subseries 1 and 2 in Figure 3), the subseries with SE values bigger than the original series and close to each other with a distance smaller than 0.05 were combined (i.e., the third and ninth subseries were combined, the fourth to sixth subseries were combined, and the seventh and eighth subseries were combined).After processing, the reconstructed subseries are shown in Figure 4.For clarity, only a portion of the data (the first 300 points) are shown.It can be seen in Figure 4 that the reconstructed subseries have obvious regularity, where the frequency increases from top to bottom, and each subseries has periodicity to a certain extent.For each subseries, the method described in Section 3.3 was used for interval prediction and the It can be seen in Figure 4 that the reconstructed subseries have obvious regularity, where the frequency increases from top to bottom, and each subseries has periodicity to a certain extent.For each  It can be seen in Figure 5 that most of the real values are within the PIs, indicating that the proposed approach is effective.The detailed values of the indices can be seen in Table 1.

Comparison of KELM and ELM
To evaluate the performance of the KELM, the interval prediction methods using KELM and ELM are compared in this section.The simulations of these two approaches were conducted 10 times using the historical data in July from wind farm M, and the results are shown in Table 1.The mean value and standard deviation for each index are also provided in Table 1.Table 1 shows that the PICP of the KELM is closer to the PINC than that of the ELM and the PINAW of KELM is narrower, except the NAD is slightly bigger, indicating that the method using the KELM has a better performance.Another significant point deduced from Table 1 is that the method based on the KELM is more stable as the standard deviations of the KELM are much smaller than those of the ELM.

Analysis of Forecasting Results
For further validation of the superiority of the interval prediction method proposed in this paper, the results of the approach which used the multi-objective function based on interval scores It can be seen in Figure 5 that most of the real values are within the PIs, indicating that the proposed approach is effective.The detailed values of the indices can be seen in Table 1.

Comparison of KELM and ELM
To evaluate the performance of the KELM, the interval prediction methods using KELM and ELM are compared in this section.The simulations of these two approaches were conducted 10 times using the historical data in July from wind farm M, and the results are shown in Table 1.The mean value and standard deviation for each index are also provided in Table 1.
Table 1 shows that the PICP of the KELM is closer to the PINC than that of the ELM and the PINAW of KELM is narrower, except the NAD is slightly bigger, indicating that the method using the KELM has a better performance.Another significant point deduced from Table 1 is that the method based on the KELM is more stable as the standard deviations of the KELM are much smaller than those of the ELM.

Analysis of Forecasting Results
For further validation of the superiority of the interval prediction method proposed in this paper, the results of the approach which used the multi-objective function based on interval scores (abbreviated as M1) are given in Table 2. Furthermore, the results of the approach based on the multi-objective function (where only two indices (I PICP and I PINAW ) were used) and those of the approach using quantile regression (abbreviated as M2 and QR, respectively) are shown for comparison with the proposed approach.PIs with PINC 90% obtained by the proposed approach and the other methods in two different months are displayed in Figures 6 and 7.
Energies 2017, 10, 419 11 of 15 (abbreviated as M1) are given in Table 2. Furthermore, the results of the approach based on the multiobjective function (where only two indices (IPICP and IPINAW) were used) and those of the approach using quantile regression (abbreviated as M2 and QR, respectively) are shown for comparison with the proposed approach.PIs with PINC 90% obtained by the proposed approach and the other methods in two different months are displayed in Figures 6 and 7.It can be seen in Figures 6 and 7 that the real data of the wind power are mostly covered by the constructed PIs, which indicates the effectiveness of these methods.The detailed indices of these methods are shown in Table 2.The mean values (Mean) and average coverage probability errors (ACPE) are also given in Table 2. (abbreviated as M1) are given in Table 2. Furthermore, the results of the approach based on the multiobjective function (where only two indices (IPICP and IPINAW) were used) and those of the approach using quantile regression (abbreviated as M2 and QR, respectively) are shown for comparison with the proposed approach.PIs with PINC 90% obtained by the proposed approach and the other methods in two different months are displayed in Figures 6 and 7.It can be seen in Figures 6 and 7 that the real data of the wind power are mostly covered by the constructed PIs, which indicates the effectiveness of these methods.The detailed indices of these methods are shown in Table 2.The mean values (Mean) and average coverage probability errors (ACPE) are also given in Table 2.It can be seen in Figures 6 and 7 that the real data of the wind power are mostly covered by the constructed PIs, which indicates the effectiveness of these methods.The detailed indices of these methods are shown in Table 2.The mean values (Mean) and average coverage probability errors (ACPE) are also given in Table 2.The following conclusions can be drawn from As seen in Table 2, across all months, the proposed method outperforms the other approaches with the resultant ACPE closer to zero, which indicates the high reliability of the constructed PIs.In addition, at the same level of reliability, the proposed method outperforms other approaches with narrower PINAWs and smaller NADs.Thus, it can be deduced that the overall performance of the proposed multi-objective function performs better than other approaches in constructing PIs.
To verify the stability of the proposed method, another set of experiments based on the data in different seasons from wind farm J were conducted, and the PIs with three different forecasting horizons (i.e., one step ahead, two steps ahead and three steps ahead) were constructed.The results are shown in Tables 3-5, and the same conclusions can be drawn using the same analytical method.For intuitive observation, the statistical indices of the proposed method, which proved to be better than M1 and M2 based on Tables 2-5, are shown in Table 6, where P PM>M1 represents the proportion of the proposed method's results better than M1, P PM>M2 represents the proportion of the proposed method's results better than M2, and P PM>QR represents the proportion of the proposed method's results better than QR.From Table 6, it can be seen that the performance of the proposed method is worse than QR in terms of I PINAW and I NAD , but the reliability of the proposed method is much better than QR.Furthermore, most of the proposed method's indices are better than the other approaches, which indicates the stability and superiority of the proposed approach.

Figure 1 .
Figure 1.Structure of the approach for wind power interval prediction.

Figure 2 .
Figure 2. Flowchart of the prediction approach using KELM-IABC.Figure 2. Flowchart of the prediction approach using KELM-IABC.

Figure 2 .
Figure 2. Flowchart of the prediction approach using KELM-IABC.Figure 2. Flowchart of the prediction approach using KELM-IABC.

Figure 3 .
Figure 3. Distribution of sample entropy values of subseries.

Figure 4 .
Figure 4. Distribution of the reconstructed subseries.

Figure 3 .
Figure 3. Distribution of sample entropy values of subseries.

Figure 3 .
Figure 3. Distribution of sample entropy values of subseries.

Figure 4 .
Figure 4. Distribution of the reconstructed subseries.

Figure 4 .
Figure 4. Distribution of the reconstructed subseries.
method described in Section 3.3 was used for interval prediction and the forecasting results are shown in Figure 5.For clarity, only a portion of the data (the first 200 points) are shown.Energies 2017, 10, 419 10 of 15 forecasting results are shown in Figure 5.For clarity, only a portion of the data (the first 200 points) are shown.

Figure 6 .
Figure 6.Distribution of wind power PIs results with PINC 90% in March 2013 of wind farm M.

Figure 7 .
Figure 7. Distribution of wind power PIs results with PINC 90% in July 2013 of wind farm M.

Figure 6 .
Figure 6.Distribution of wind power PIs results with PINC 90% in March 2013 of wind farm M.

Figure 6 .
Figure 6.Distribution of wind power PIs results with PINC 90% in March 2013 of wind farm M.

Figure 7 .
Figure 7. Distribution of wind power PIs results with PINC 90% in July 2013 of wind farm M.

Figure 7 .
Figure 7. Distribution of wind power PIs results with PINC 90% in July 2013 of wind farm M.

Table 1 .
Prediction results of interval prediction method using ELM and KELM.

Table 1 .
Prediction results of interval prediction method using ELM and KELM.

Table 2 .
Prediction results of wind farm M in different months.

Table 2 : 1
In terms of the reliability (the closer the PICP is to the PINC, the better), almost all the predicted results of the proposed method, M1 and M2 are closer to 90% than the results of QR.The maximum values of the proposed method, M1, M2 and QR are 92.5%,92.54%, 92.65% and 100%, respectively, and the minimum values of the proposed method, M1, M2 and QR are 88.25%, 87.5%, 87.54% and 82.72%, respectively.The ACPE values of the proposed method, M1, M2 and QR are 1.514, 1.56, 1.531 and 6.209, respectively, indicating that the overall reliability of the proposed method is comparable to that of M1 or M2, and the performance of the proposed method is slightly better than the other two methods.It can also be deduced that the QR approach has the worst performance as its ACPE value is the biggest.

Table 3 .
Prediction results of wind farm J for one-step-ahead forecasting in different seasons.

Table 4 .
Prediction results of wind farm J for two-steps-ahead forecasting in different seasons.

Table 5 .
Prediction results of wind farm J for three-steps-ahead forecasting in different seasons.

Table 6 .
Statistical indices of the proposed approach.