Joint Point-Interval Prediction and Optimization of Wind Power Considering the Sequential Uncertainties of Stepwise Procedure

: To support high-level wind energy utilization, wind power prediction has become a more and more attractive topic. To improve prediction accuracy and ﬂexibility, joint point-interval prediction of wind power via a stepwise procedure is studied in this paper. Firstly, time-information-granularity (TIG) is deﬁned for ultra-short-term wind speed prediction. Hidden features of wind speed in TIGs are extracted via principal component analysis (PCA) and classiﬁed via adaptive a ﬃ nity propagation (ADAP) clustering. Then, Gaussian process regression (GPR) with joint point-interval estimation ability is adopted for stepwise prediction of the wind power, including wind speed prediction and wind turbine power curve (WTPC) modeling. Considering the sequential uncertainties of stepwise prediction, theoretical support for an uncertainty enlargement e ﬀ ect is deduced. Uncertainties’ transmission from single-step or receding multi-step wind speed prediction to wind power prediction is explained in detail. After that, normalized indexes for point-interval estimation performance are presented for GPR parameters’ optimization via a hybrid particle swarm optimization-di ﬀ erential evolution (PSO-DE) algorithm. K -fold cross validation ( K -CV) is used to test the model stability. Moreover, due to the timeliness of data-driven GPR models, an evolutionary prediction mechanism via sliding time window is proposed to guarantee the required accuracy. Finally, measured data from a wind farm in northern China are acquired for validation. From the simulation results, several conclusions can be drawn: the multi-model structure has insigniﬁcant advantages for wind speed prediction via GPR; joint point-interval prediction of wind power is realizable and very reasonable; uncertainty enlargement exists for stepwise prediction of wind power while it is more signiﬁcant after receding multi-step prediction of wind speed; a reasonable quantiﬁcation mechanism for uncertainty is revealed and validated.


Introduction
Nowadays, large-scale and distributed utilization of wind power has been widely developed. In the long run, wind power generation will play an important role in the future energy structure [1] while constantly improving its own shortcomings such as wind turbine noise impact [2][3][4][5] and wind energy volatility. With the rapid growth of the proportion of grid-connected wind power, whether for a large-grid or for a micro-grid, the volatility of primary wind energy is usually a concern for the safe and economic operation of the power grid. Against this background, wind power prediction is very studied interval prediction of wind power for a wind farm considering the uncertainty of wind speed prediction via ARMA-GARCH and the operation probability of wind turbines. It suggested the necessity of considering sequential uncertainties for wind power prediction. However, it is different from the ultra-short-term prediction of wind power for a wind turbine where both the uncertainties of wind speed prediction and WTPC modeling need to be considered.
Comparing different interval modeling methods, two classes can be obtained. One class is based on conditional probability distribution of output against inputs where regression values in view of conditional expectation and boundaries under certain confidence degree can be obtained. The methods such as conditional kernel density estimation, conditional copula, GPR and relevant vector machine belong to this class, where point and interval estimation can be jointly realized. The other class is based on conditional probability of prediction error against predicted values using normal point regression algorithms. Confidence boundaries of prediction error can be also yielded [26,27]. Under different application scenarios such as single-step or multi-step prediction of wind power, appropriate method should be selected to get reasonable description form of interval prediction.
Considering the feasibility of ultra-short-term wind power prediction via a stepwise procedure, GPR is chosen as the main algorithm for wind speed prediction and WTPC modeling. On this basis, the main contributions of this paper may be listed as follows: • The concept of TIG is defined to build an input-output matrix for ultra-short-term wind speed prediction. A PCA-clustering scheme is proposed to extract hidden wind patterns in TIGs. • Joint point-interval prediction of wind power is clearly claimed and studied via GPR. Theoretical support for uncertainty enlargement due to sequential uncertainties during stepwise procedure is deduced. • Normalized and comprehensive evaluation indexes for joint point-interval estimation performance are defined systematically. Hybrid PSO-DE algorithm and K-CV method are used to obtain better and more stable performance. • After single-step or receding multi-step wind speed prediction, uncertainty enlargement effects during stepwise wind power prediction is revealed and validated using parametric and non-parametric interval modeling methods.
The rest of the paper is organized as follows: Section 2 presents the establishment of a multi-model structure for wind speed prediction. Section 3 completely proposes the joint point-interval prediction of wind power via stepwise procedure. Section 4 defines normalized and comprehensive evaluation indexes for joint point-interval prediction of wind power while evolutionary updating mechanism is raised. Simulation and analysis are executed in Section 5. Section 6 concludes the paper.

Formation of Input-Ouput Matrix
To realize receding wind power prediction of a single wind turbine, a stepwise procedure is adopted in this paper, including receding wind speed prediction and WTPC modeling. We set the sampling period as T V and split the wind speed time series into many segments, where a segment is defined by , v i (k + 1 + s), . . . , v i (k + j + s), . . . , v i (k + n − 1 + s)] (1 ≤ s ≤ n). Different values of s mean different ways of sampling the data. V i is the i-th wind speed segment with n elements sequentially distributed along n time instants. V i is a TIG in time-domain space. Assume m segments can be obtained, making up the following matrix V.
For V, rows are seen as r repetitions. Columns from 1 to (n − 1) are seen as inputs, shown by V i,In = [v i1 , v i2 , . . . , v ij , . . . , v in−1 ]. The n-th column is seen as output, shown by V i,Out = v in (i = 1, 2, . . . , r; j = 1, 2, . . . , n). As a result, for receding wind speed prediction, V is the input-output matrix. Relatively, WTPC modeling is independent of receding prediction of wind speed. Only wind speed and power data are needed as the input and output of WTPC modeling.

Feature Extraction of Hidden Wind Speed Patterns
For a wind turbine at a specific site, the wind conditions may change with the seasons, wind directions and spatial terrains. It is intuitive that prediction accuracy may be raised under the same wind conditions. However, it is not certain for different prediction algorithms. In this subsection, wind speed is used to represent wind conditions and hidden wind speed patterns are extracted.
Using TIGs of wind speed, PCA [28] is an effective post-processing manner to extract features of TIGs and to form feature-information-granules. It is convenient for dimensionality reduction of high-dimensional data via a statistical procedure. Then, noise and unimportant features can be removed to enhance the calculation efficiency. It uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components where the first principal component has the largest possible variance and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components. Each component vector is a linear combination of input variables and all component vectors form an uncorrelated orthogonal basis set.
Based on Equation (1), we remove the column mean values of V and it becomes a column-wise zero empirical mean. Then, PCA is executed and p new features can be obtained where x il = ω 1l ·v i1 + ω 2l ·v i2 + . . . + ω jl ·v ij + . . . + ω nl ·v in = ω l ·V i T (ω l = [ω 1l , ω 2l , . . . , ω jl , . . . , ω nl ]; l = 1, 2, . . . , h). As a result, feature matrix can be represented as follows: If n is chosen to be much bigger than h, the dimensionality reduction effect becomes obvious. This provides good convenience for the subsequent calculation.

Classification of Extracted Features via Clustering
Extracted features represent the main hidden information in TIGs. To classify them, an adaptive classification strategy via clustering is presented here. Firstly, unsupervised clustering algorithms such as K-medoids clustering, fuzzy C-means (FCM) clustering, Gaussian-mixture-model (GMM) clustering and ADAP clustering, are adopted and compared. Secondly, to evaluate clustering effects, the silhouette coefficient is used considering inter-cluster and intra-cluster effects which has better and more balanced evaluation performance than other evaluation indexes such as Davies-Bouldin index and Dunn-Validity index. The silhouette coefficient range is [−1,1] and it measures how similar an object is to its own cluster compared to other clusters. A high value suggests that the object is well suited to its own cluster and poorly suited to the neighboring clusters.
Silhouette coefficients can be calculated with any distance metric including Euclidean distance and Manhattan distance, etc. Define d intra (u i ) as average distance between point u i and all other points within the same cluster. Define d intel (u i ) as average distance between point u i and all points in any other cluster. Then, the silhouette coefficient [29] is calculated by: where the value close to 1 means that the point is appropriately clustered; the value close to −1 means that the point would be more appropriately clustered in its neighbouring cluster; a zero value means that the point is on the border of two clusters. Using the silhouette coefficient, the effects of different clustering algorithms can be evaluated and even optimized by the intelligent evolutionary algorithms such as genetic algorithm (GA), PSO and DE algorithms. Finally, wind speed TIGs can be classified via clustering. Besides, new input TIGs can also be classified into reasonable clusters via the silhouette coefficient. It provides a judgment for model switching under a multi-model structure.
Overall, as shown in Figure 1, the classification scheme of hidden wind speed patterns is wholly proposed to establish multi-model structure for receding wind speed prediction. the silhouette coefficient is used considering inter-cluster and intra-cluster effects which has better and more balanced evaluation performance than other evaluation indexes such as Davies-Bouldin index and Dunn-Validity index. The silhouette coefficient range is [-1,1] and it measures how similar an object is to its own cluster compared to other clusters. A high value suggests that the object is well suited to its own cluster and poorly suited to the neighboring clusters.
Silhouette coefficients can be calculated with any distance metric including Euclidean distance and Manhattan distance, etc. Define dintra(ui) as average distance between point ui and all other points within the same cluster. Define dintel(ui) as average distance between point ui and all points in any other cluster. Then, the silhouette coefficient [29] is calculated by: where the value close to 1 means that the point is appropriately clustered; the value close to −1 means that the point would be more appropriately clustered in its neighbouring cluster; a zero value means that the point is on the border of two clusters. Using the silhouette coefficient, the effects of different clustering algorithms can be evaluated and even optimized by the intelligent evolutionary algorithms such as genetic algorithm (GA), PSO and DE algorithms. Finally, wind speed TIGs can be classified via clustering. Besides, new input TIGs can also be classified into reasonable clusters via the silhouette coefficient. It provides a judgment for model switching under a multi-model structure.
Overall, as shown in Figure 1, the classification scheme of hidden wind speed patterns is wholly proposed to establish multi-model structure for receding wind speed prediction.

Joint Point-Interval Modeling
Stepwise prediction of wind power includes wind speed prediction and WTPC modeling. Herein, each step is taken as a joint point-interval modeling process. For regression algorithms such as artificial neural network (ANN), support vector machine (SVM), GPR and RVM, the point output is usually the conditional expectation of inputs. To realize joint point-interval modeling, not only conditional expectation but also conditional probability are needed where GPR is capable to fulfill this task.
GPR is a supervised method for regression and probabilistic prediction. Based on measured data, nonparametric kernel-based probabilistic models can be built. Computing empirical confidence intervals is an advantage of GPR. Assume the measured training data set {(ai, bi); i = 1, 2,…, nTrai} where ai∈R c and bi∈R. Estimation of bi can be represented by [26]:

Joint Point-Interval Modeling
Stepwise prediction of wind power includes wind speed prediction and WTPC modeling. Herein, each step is taken as a joint point-interval modeling process. For regression algorithms such as artificial neural network (ANN), support vector machine (SVM), GPR and RVM, the point output is usually the conditional expectation of inputs. To realize joint point-interval modeling, not only conditional expectation but also conditional probability are needed where GPR is capable to fulfill this task.
GPR is a supervised method for regression and probabilistic prediction. Based on measured data, nonparametric kernel-based probabilistic models can be built. Computing empirical confidence intervals is an advantage of GPR. Assume the measured training data set {(a i , b i ); i = 1, 2, . . . , n Trai } where a i ∈R c and b i ∈R. Estimation of b i can be represented by [26]: where ε is noise assumed to be independent normal distribution with variance σ 2 . f (a) can be seen as a feature space in form f (a) = φ(a) T α where φ(a) is basis function and α is coefficients. f (a) is also a Gaussian process defined as follows [30]: where a and a are any two points in R c ; K(•) is the chosen kernel function. From Equation (5), it suggests that each observation in a is normal distribution and joint probability distribution of observations in a is a multivariate normal distribution due to every finite linear combination of these observations are still normal distribution. Meanwhile, estimated output of Equation (4) is also a normal distribution which can be represented by [30]: Assume testing input data are a*. Then, testing output and training output form the following joint normal distribution [26]: As a result, regression values of testing inputs, conditional expectation ofb * under (a,b, a*), can be calculated by [30]:b * a,b, a * ∼ pro b * a,b, a * = N b * a, f (a), σ 2 I, a * = N m * , cov b * with m * = E b * a, f (a), σ 2 I, a * = K(a * , a) K(a, a ) + σ 2 I −1b cov b * = K(a * , a * ) − K(a * , a) K(a, a ) + σ 2 I −1 K(a, a * ) (8) where cov(•) is a covariance vector ofb * under (a,b, a*). In Equation (8), m* is regression output. Based on the covariance vector, conditional probability distribution of output can also be obtained. Thus, GPR can be directly used for joint point-interval modeling. The whole procedure is shown in Figure 2.
where ε is noise assumed to be independent normal distribution with variance σ 2 . f(a) can be seen as a feature space in form f(a) = ϕ(a) T α where ϕ(a) is basis function and α is coefficients. f(a) is also a Gaussian process defined as follows [30]: where a and a′ are any two points in R c ; K(•) is the chosen kernel function. From Equation (5), it suggests that each observation in a is normal distribution and joint probability distribution of observations in a is a multivariate normal distribution due to every finite linear combination of these observations are still normal distribution. Meanwhile, estimated output of Equation (4) is also a normal distribution which can be represented by [30]: Assume testing input data are a*. Then, testing output and training output form the following joint normal distribution [26]: As a result, regression values of testing inputs, conditional expectation of * b under (a,b ,a*), can be calculated by [30]: where cov(•) is a covariance vector of * b under (a,b ,a*). In Equation (8), m* is regression output. Based on the covariance vector, conditional probability distribution of output can also be obtained. Thus, GPR can be directly used for joint point-interval modeling. The whole procedure is shown in Figure 2.

Sequential Uncertainties of Stepwise Procedure Based on Single-Step Prediction of Wind Speed
Using stepwise procedure, wind speed prediction and WTPC modeling are carried out separately via GPR. From Equation (1), we assume the training and testing data are (V In , V Out ) and (V In *, V Out *) for GPR modeling of wind speed prediction. Then, testing output V Out * is represented by: where for details of the calculation one can refer to Equation (8). Uncertainties transmission during single-step wind speed prediction is shown in Figure 3a. Point and interval predictions of wind speed at sequential time points composes new inputs to WTPC model.

Sequential Uncertainties of Stepwise Procedure Based on Single-step Prediction of Wind Speed
Using stepwise procedure, wind speed prediction and WTPC modeling are carried out separately via GPR. From Equation (1), we assume the training and testing data are (VIn, VOut) and (VIn*, VOut*) for GPR modeling of wind speed prediction. Then, testing output VOut* is represented by: where for details of the calculation one can refer to Equation (8). Uncertainties transmission during single-step wind speed prediction is shown in Figure 3a. Point and interval predictions of wind speed at sequential time points composes new inputs to WTPC model.  For WTPC modeling, input and output are wind speed V and active power PAct. Assume training and testing data are (V, PAct) and (V*, PAct*) for GPR modeling of WTPC. Then, the testing output of PAct* is represented by: For stepwise prediction based on single-step prediction of wind speed, results of wind speed prediction are input of WTPC model where V = VOut. Due to the fact equations (9) and (10) are both normal distributions, the final output of PAct* can also be shown as: where V* = Vout*. From Equation (11), the final output reflects the enlargement effect of sequential uncertainties at each step. As a result, confidence intervals of WTPC output are also enlarged, as shown in Figure 3a. To obtain the required results, the uncertainty of each step can be adjusted separately. For WTPC modeling, input and output are wind speed V and active power P Act . Assume training and testing data are (V, P Act ) and (V*, P Act *) for GPR modeling of WTPC. Then, the testing output of P Act * is represented by:

Sequential Uncertainties of Stepwise Procedure Based on Receding Multi-Step Prediction of Wind Speed
For stepwise prediction based on single-step prediction of wind speed, results of wind speed prediction are input of WTPC model where V = V Out . Due to the fact Equations (9) and (10) are both normal distributions, the final output of P Act * can also be shown as: where V* = V out *. From Equation (11), the final output reflects the enlargement effect of sequential uncertainties at each step. As a result, confidence intervals of WTPC output are also enlarged, as shown in Figure 3a. To obtain the required results, the uncertainty of each step can be adjusted separately.

Sequential Uncertainties of Stepwise Procedure Based on Receding Multi-Step Prediction of Wind Speed
GPR is a single-step prediction method. If multi-step prediction is needed, receding wind speed prediction should be executed, where the predicted value v*(k + T + 1) at current time k + T + 1 is used as input for the prediction at next time k + T + 2. In this case, the uncertainties of point and interval prediction at the current time are also passed to the next time. Then, how to quantify the transferred uncertainties during receding multi-step prediction is a critical problem. Of course, if receding predictions are input to WTPC model, the predicted power also succeeds these uncertainties, shown in Figure 3b.
At each step of receding wind speed prediction, the relationship between input and output is variable. Thus, joint point-interval prediction at different times will yield different results. This suggests that only the statistical results at each prediction step will be meaningful. Instead of obtaining joint point-interval predictions by GPR, confidence intervals of prediction errors are used to describe uncertainties at each prediction step. Herein, KDE and Gaussian methods are compared to calculate confidence intervals of prediction errors at different receding steps. Using KDE, non-parametric probability distribution can be obtained according to prediction error distribution at each step. Using Gaussian method, parametric probability distribution can be obtained where Gaussian distribution is assumed at each step. They are both adopted to validate that their confidence intervals of prediction errors at each step have great similarity.

Prediction Performance Evaluation
Regarding joint point-interval prediction, how to evaluate or optimize prediction performance and how to guarantee timeliness of models, are still problems to be solved in this section.
For point prediction of wind speed or power, error-based indexes such as mean absolute error (MAE), root mean square error (RMSE) and their deformations, etc., are usually used for evaluation. For interval prediction, indexes such as prediction intervals coverage probability (PICP), coverage error (CE) and prediction intervals average width (PIAW), etc., are usually used [15,31].
In essence, confidence intervals, output by GPR, are again the conditional probability input. Thus, PICP and CE indexes need to be calculated for certain input. Through dividing range of input into bins, they can also be calculated for certain bin. Assume the required confidence against the l-th input bin is α, PICP in this bin is defined as follows: where N p (l) is number of points falling into confidence intervals with required confidence α against the l-th input bin. κ i (α) (l) is an indicative function for the i-th point (if this point falls into the confidence . Then, CE of confidence interval against the l-th input bin is: For wind speed prediction by GPR, partition of input bins may be roughly realized by clustering. For WTPC modeling by GPR, partition of input bins mainly refers to wind speed range. However, in actual execution, input data are non-uniform and calculation of Equations (12) and (13) is inaccurate and complex. When c j /g j →1 (c j , g j are arbitrary positive integers), the following equation exists: According to Equation (14), PICP and CE indexes under different input bins have the same limit with that directly calculated in the intervals under the whole input range. It is easy to be executed, defined by: where MPICP and MCE are the mixed statistics of PICP and MCE, respectively. N p is total number of predicted points. Especially, for a higher confidence degree approaching 1, limits of Equations (12) and (15), Equations (13) and (16) approach closer with each other according to Equation (14). In this paper, MPICP and MCE indexes are adopted in actual execution. Besides, PIAW is defined by: where U i (α) and L i (α) are upper and lower intervals for the i-th point.
In order to comprehensively evaluate the performance of joint point-interval prediction, weighted-index (WI) are proposed as follows: where λ* represents the weight. NRMSE, NMCE and NPIAW are normalization of RMSE, MCE and PIAW, defined as follows: where mean(*) represents average value. W set represents pre-setted interval width.

Optimal Modeling, Corss Validation and Sliding Updation of Prediction Models
Using WI as optimization target, the GPR model may be optimized to obtain a more balanced performance. Herein, a hybrid PSO-DE algorithm is used. Based on the standard PSO and DE algorithms, the steps of the hybrid PSO-DE algorithm [32] are shown in Figure 4. Assume the maximum iterations of PSO and DE algorithms are I pso and I de . In the early stage, PSO is executed until the iteration times exceed I pso /2. Then, new particles with the number of 20% of the initial population size are generated in the neighborhood of obtained optimal position. Following this step, the DE algorithm is executed for the newly generated particles until the iteration times exceed I de . The optimal solutions obtained by DE is then evaluated by the fitness function of PSO. Nested execution of DE can be repeated during execution of PSO until iteration times of PSO exceed I pso and then the PSO-DE algorithm terminates. This hybrid algorithm combines the fast global convergence of PSO and the high-precision search ability of DE. Thus, the optimizing efficiency and accuracy can be greatly improved.  For training and testing of GPR, K-CV is adopted to verify stability and robustness of GPR models. Average WI value of k times is used to evaluate the modeling performance. The optimal GPR modeling procedure is shown in Figure 5, which can be both used for wind speed prediction and WTPC modeling.  Figure 5. Diagram of GPR modeling procedure.
Due to the fact the GPR model is built using historical data from a past certain period, its timeliness also depends on these data. Thus, a non-negligible question is the timeliness of the databased model. It needs to be updated in time. Herein, a sliding-time-window is adopted where the database for modeling is incrementally updated with a certain time window. As a result, GPR models are also regularly updated in time. Trigger mechanism of model updating can be event-driven by monitoring of WI or time-driven by sliding-time-window. For training and testing of GPR, K-CV is adopted to verify stability and robustness of GPR models. Average WI value of k times is used to evaluate the modeling performance. The optimal GPR modeling procedure is shown in Figure 5, which can be both used for wind speed prediction and WTPC modeling.  For training and testing of GPR, K-CV is adopted to verify stability and robustness of GPR models. Average WI value of k times is used to evaluate the modeling performance. The optimal GPR modeling procedure is shown in Figure 5, which can be both used for wind speed prediction and WTPC modeling.  Figure 5. Diagram of GPR modeling procedure.
Due to the fact the GPR model is built using historical data from a past certain period, its timeliness also depends on these data. Thus, a non-negligible question is the timeliness of the databased model. It needs to be updated in time. Herein, a sliding-time-window is adopted where the database for modeling is incrementally updated with a certain time window. As a result, GPR models are also regularly updated in time. Trigger mechanism of model updating can be event-driven by monitoring of WI or time-driven by sliding-time-window. Due to the fact the GPR model is built using historical data from a past certain period, its timeliness also depends on these data. Thus, a non-negligible question is the timeliness of the data-based model. It needs to be updated in time. Herein, a sliding-time-window is adopted where the database for modeling is incrementally updated with a certain time window. As a result, GPR models are also regularly updated in time. Trigger mechanism of model updating can be event-driven by monitoring of WI or time-driven by sliding-time-window.

Simulations
Based on the data from the supervisory control and data acquisition (SCADA) system of a wind farm located in northern China, wind speed and power data are acquired and preprocessed for prediction. In the wind farm, 1.5 megawatt wind turbines with variable-speed variable-pitch ability are mainly used. The sampling period of the data is 15 min, which is also the time interval of single-step prediction.

Decision of TIG Length for Wind Speed Prediction
TIG length predetermines the input dimensions of GPR. Herein, single-step prediction of wind speed using different TIG lengths are tested to select an appropriate length. We arbitrarily choose wind speed data with a sampling period of 15 min. Two thousand (2000) data points are randomly selected for training (1600 points) and testing (400 points). Besides, for GPR modeling, type of basis-function is pure quadratic and the type of kernel-function is square exponential, defined as follows: where σ l is the characteristic length scale. σ f is the signal standard deviation. Mean square error (MSE) is used for optimization the parameters such as σ, σ l and σ f . Indexes such as NRMSE, NMACE, NPIAW and WI are used for evaluation where weights are all 1/3, as shown in Table 1. Two samples are randomly selected for testing under each TIG length. From Table 1, NMACE varies obviously while the indexes of NRMSE and NPIAW perform relatively stable. When the TIG length is greater than or equal to 6, their WIs become stable at the same bound level. To avoid unnecessary computational burden, 6 is selected as the TIG length in this paper.

Necessity of Comprehensive Optimization Based on WI
Normally, the parameters of GPR are optimized using MSE of point prediction as the objective. This doesn't consider interval performance. In extreme cases, excessively optimizing MSE may cause performance deterioration of the interval prediction. Thus, in this subsection, different objectives are tested for wind speed prediction and WTPC modeling using GPR where PSO-DE is adopted.
For wind speed prediction via GPR, testing results are shown in Table 2 and Figure 6. In Figure 6, 'Estimated 1', 'Upper 1' and 'Lower 1' are results using WI as objective. 'Estimated 2', 'Upper 2' and 'Lower 2' are results using MSE as objective. Just using MSE as optimization objective, its point prediction index, NRMSE, is very close with that of WI. However, their interval prediction indexes, NMACE and NPIAW are distinguished from each other greatly. As a result, their WIs are also different. These results reflect a truth that using WI as objective is necessary for parameter optimization of GPR when predicting wind speed. Of course, lower boundaries in Figure 6 are retained to show interval modeling effect of WTPC from a statistical view. In actual, physical boundaries of wind speed should be considered. Usually, zero line is the lower physical boundary of wind speed whereas upper boundary of GPR fulfills its physical meaning. For wind speed prediction via GPR, testing results are shown in Table 2 and Figure 6. In Figure  6, 'Estimated 1', 'Upper 1' and 'Lower 1' are results using WI as objective. 'Estimated 2', 'Upper 2' and 'Lower 2' are results using MSE as objective. Just using MSE as optimization objective, its point prediction index, NRMSE, is very close with that of WI. However, their interval prediction indexes, NMACE and NPIAW are distinguished from each other greatly. As a result, their WIs are also different. These results reflect a truth that using WI as objective is necessary for parameter optimization of GPR when predicting wind speed. Of course, lower boundaries in Figure 6 are retained to show interval modeling effect of WTPC from a statistical view. In actual, physical boundaries of wind speed should be considered. Usually, zero line is the lower physical boundary of wind speed whereas upper boundary of GPR fulfills its physical meaning.  For WTPC modeling via GPR, testing results are shown in Table 3, Figures 7 and 8. In Figures 7  and 8, 'Estimated 1', 'Upper 1' and 'Lower 1' are results using WI as objective. 'Estimated 2', 'Upper 2' and 'Lower 2' are results using MSE as objective. It can be found that taking MSE or WI as objective has very little influence on modeling performance. Moreover, the two objectives have very close modeling effects, including both the point and interval indexes. Thus, it is unnecessary to distinguish the usage of MSE and WI for parameter optimization of WTPC modeling via GPR. In Figure 7, wind power exceeding cut-in wind speed 3 m/s is shown. In Figure 8, physical boundaries of wind power should be also paid attention where zero line is the lower physical boundary whereas upper boundary of GPR fulfills its physical meaning.  For WTPC modeling via GPR, testing results are shown in Table 3 and 'Lower 2' are results using MSE as objective. It can be found that taking MSE or WI as objective has very little influence on modeling performance. Moreover, the two objectives have very close modeling effects, including both the point and interval indexes. Thus, it is unnecessary to distinguish the usage of MSE and WI for parameter optimization of WTPC modeling via GPR. In Figure 7, wind power exceeding cut-in wind speed 3 m/s is shown. In Figure 8, physical boundaries of wind power should be also paid attention where zero line is the lower physical boundary whereas upper boundary of GPR fulfills its physical meaning.

Establishment and Testing of Multi-Model Structure via PCA-Clustering Scheme
Intuitively, partition of wind speed patterns can raise prediction accuracy of wind speed. However, under the Bayesian framework of GPR, it will be seriously tested in this subsection. To identify the hidden wind speed patterns effectively, PCA-clustering strategy is presented. Herein, the total size of the data sample is (6300, 6) where the input is five dimensional and the output is one dimensional. After feature extraction, three dominant features are used for clustering, where K-

Establishment and Testing of Multi-Model Structure via PCA-Clustering Scheme
Intuitively, partition of wind speed patterns can raise prediction accuracy of wind speed. However, under the Bayesian framework of GPR, it will be seriously tested in this subsection. To identify the hidden wind speed patterns effectively, PCA-clustering strategy is presented. Herein, the total size of the data sample is (6300, 6) where the input is five dimensional and the output is one dimensional. After feature extraction, three dominant features are used for clustering, where K-

Establishment and Testing of Multi-Model Structure via PCA-Clustering Scheme
Intuitively, partition of wind speed patterns can raise prediction accuracy of wind speed. However, under the Bayesian framework of GPR, it will be seriously tested in this subsection. To identify the hidden wind speed patterns effectively, PCA-clustering strategy is presented. Herein, the total size of the data sample is (6300, 6) where the input is five dimensional and the output is one dimensional.
After feature extraction, three dominant features are used for clustering, where K-medoids, ADAP, FCM and GMM clustering methods are compared. The results are shown in Table 4 using the silhouette coefficient as evaluation index. Concerning consumed time, K-medoids, FCM and GMM performs well, whereas ADAP is very time consuming. When the cluster number equals 2, all methods have similar silhouette coefficients. However, the data uniformities of the clusters are different. For K-medoids, FCM and GMM methods, data points of the two clusters are 641|5659, 597|5703, 1082|5218 and 2066|4234, respectively. In contrast, though ADAP is time consuming, it has a better silhouette coefficient and data uniformity. Thus, the data by ADAP with two clusters is utilized for sequential research, shown in Figure 9. Lines of quartile for each cluster clearly divide the data samples. It provides good foundation for establishment of a multi-model structure.  Table  4 using the silhouette coefficient as evaluation index. Concerning consumed time, K-medoids, FCM and GMM performs well, whereas ADAP is very time consuming. When the cluster number equals 2, all methods have similar silhouette coefficients. However, the data uniformities of the clusters are different. For K-medoids, FCM and GMM methods, data points of the two clusters are 641|5659, 597|5703, 1082|5218 and 2066|4234, respectively. In contrast, though ADAP is time consuming, it has a better silhouette coefficient and data uniformity. Thus, the data by ADAP with two clusters is utilized for sequential research, shown in Figure 9. Lines of quartile for each cluster clearly divide the data samples. It provides good foundation for establishment of a multi-model structure.  Taking WI as objective, a comparison of different data samples is shown in Table 5 and Figure  10, where five-fold cross validation is adopted to show the effectiveness of the GPR models. Analyzing the input data patterns in Figure 9, the variation of the data of cluster-2 is stronger than that of the data of cluster-1. This explains, according to RMSE, NMACE and NPIAW, why he GPR model based on the data of cluster-2 performs worse than that based on the data of cluster-1. Due to the fact NRMSE is normalized by the mean value of output in Equation (19), it also explains that why the NRMSE values of cluster-2 are smaller than those of cluster-1. In contrast, the indexes of the GPR model based on total data samples are a compromise between that based on the data of cluster-1 and cluster-2.
The testing results suggest a truth, namely that that partition of wind speed patterns is uncertain to raise wind speed prediction accuracy for GPR modeling under a Bayesian framework. Regression and interval prediction by GPR is based on posterior probability of historical and new input data. It is highly dependent on distribution of the input data. In summary, for wind speed prediction by GPR, a multi-model structure doesn't offer a significant improvement and its establishment may be unnecessary. Taking WI as objective, a comparison of different data samples is shown in Table 5 and Figure 10, where five-fold cross validation is adopted to show the effectiveness of the GPR models. Analyzing the input data patterns in Figure 9, the variation of the data of cluster-2 is stronger than that of the data of cluster-1. This explains, according to RMSE, NMACE and NPIAW, why he GPR model based on the data of cluster-2 performs worse than that based on the data of cluster-1. Due to the fact NRMSE is normalized by the mean value of output in Equation (19), it also explains that why the NRMSE values of cluster-2 are smaller than those of cluster-1. In contrast, the indexes of the GPR model based on total data samples are a compromise between that based on the data of cluster-1 and cluster-2.
The testing results suggest a truth, namely that that partition of wind speed patterns is uncertain to raise wind speed prediction accuracy for GPR modeling under a Bayesian framework. Regression and interval prediction by GPR is based on posterior probability of historical and new input data. It is highly dependent on distribution of the input data. In summary, for wind speed prediction by GPR, a multi-model structure doesn't offer a significant improvement and its establishment may be unnecessary.

Wind Power Predcition Based on Single-Step Wind Speed Prediction Considering Sequential Uncertainties
Using the GPR model of single-step wind speed prediction in Figure 10c and the new WTPC model, their sequential uncertainties for wind power prediction is studied in this subsection. Herein, 6377 data pairs of wind speed-power are adopted for WTPC modeling. As shown in Figure 11, NRMSE, NMACE, NPIAW and WI of the final WTPC model via GPR are 0.2567, 0.007074, 0.2804 and 0.1813, respectively. Considering the sequential uncertainties of wind speed prediction and WTPC modeling, the output of a single-step wind power prediction is shown in Figure 12. Calculating the conditional probability distribution of the output error against the estimated power using the KDE and Gaussian methods, their confidence intervals under confidence 0.95 are also shown in Figure 12. Gaussian is a parametric method for estimating confidence interval while KDE is a non-parametric method for that. Evaluation indexes for interval prediction performances, such as NMACE, NPIAW and WI, of them are shown in Table 6. It can be found that the confidence interval of GPR is very similar with that of KDE and Gaussian. In Figure 12, the upper and lower boundaries of these methods are directly shown from a statistical viewpoint to display the modeling effects while physical boundaries of wind power should be considered in actuality.

Wind Power Predcition Based on Single-Step Wind Speed Prediction Considering Sequential Uncertainties
Using the GPR model of single-step wind speed prediction in Figure 10c and the new WTPC model, their sequential uncertainties for wind power prediction is studied in this subsection. Herein, 6377 data pairs of wind speed-power are adopted for WTPC modeling. As shown in Figure 11, NRMSE, NMACE, NPIAW and WI of the final WTPC model via GPR are 0.2567, 0.007074, 0.2804 and 0.1813, respectively.

Wind Power Predcition Based on Single-Step Wind Speed Prediction Considering Sequential Uncertainties
Using the GPR model of single-step wind speed prediction in Figure 10c and the new WTPC model, their sequential uncertainties for wind power prediction is studied in this subsection. Herein, 6377 data pairs of wind speed-power are adopted for WTPC modeling. As shown in Figure 11, NRMSE, NMACE, NPIAW and WI of the final WTPC model via GPR are 0.2567, 0.007074, 0.2804 and 0.1813, respectively. Considering the sequential uncertainties of wind speed prediction and WTPC modeling, the output of a single-step wind power prediction is shown in Figure 12. Calculating the conditional probability distribution of the output error against the estimated power using the KDE and Gaussian methods, their confidence intervals under confidence 0.95 are also shown in Figure 12. Gaussian is a parametric method for estimating confidence interval while KDE is a non-parametric method for that. Evaluation indexes for interval prediction performances, such as NMACE, NPIAW and WI, of them are shown in Table 6. It can be found that the confidence interval of GPR is very similar with that of KDE and Gaussian. In Figure 12, the upper and lower boundaries of these methods are directly shown from a statistical viewpoint to display the modeling effects while physical boundaries of wind power should be considered in actuality. Considering the sequential uncertainties of wind speed prediction and WTPC modeling, the output of a single-step wind power prediction is shown in Figure 12. Calculating the conditional probability distribution of the output error against the estimated power using the KDE and Gaussian methods, their confidence intervals under confidence 0.95 are also shown in Figure 12. Gaussian is a parametric method for estimating confidence interval while KDE is a non-parametric method for that. Evaluation indexes for interval prediction performances, such as NMACE, NPIAW and WI, of them are shown in Table 6. It can be found that the confidence interval of GPR is very similar with that of KDE and Gaussian. In Figure 12, the upper and lower boundaries of these methods are directly shown from a statistical viewpoint to display the modeling effects while physical boundaries of wind power should be considered in actuality.  In summary, the above simulation results reflects a truth that the sequential uncertainties of wind speed prediction and WTPC modeling is perfectly validated. In the future, if a stepwise procedure is adopted for the wind power prediction of wind turbines, the accuracy of each step can be monitored and improved to guarantee a good final accuracy.

Wind Power Predcition Based on Multi-Step Wind Speed Prediction Considering Sequential Uncertainties
Different from the single-step prediction, multi-step wind speed prediction is discussed in this subsection. In this case, the uncertainties during multi-step prediction need to be considered. As introduced in subsection 3.3, only the statistics of the output error at each step are meaningful. Firstly, uncertainties transmission during ten-step wind speed prediction are shown in Table 7 and Figure  13, where the KDE and Gaussian methods are adopted to calculate the mean value and confidence intervals (with confidence 0.95) of the prediction error. All the statistical values of the KDE and Gaussian methods show the enlargement effects of receding steps during the ten-step wind speed prediction.  Figure 12. Single-step wind power prediction considering sequential uncertainties. In summary, the above simulation results reflects a truth that the sequential uncertainties of wind speed prediction and WTPC modeling is perfectly validated. In the future, if a stepwise procedure is adopted for the wind power prediction of wind turbines, the accuracy of each step can be monitored and improved to guarantee a good final accuracy.

Wind Power Predcition Based on Multi-Step Wind Speed Prediction Considering Sequential Uncertainties
Different from the single-step prediction, multi-step wind speed prediction is discussed in this subsection. In this case, the uncertainties during multi-step prediction need to be considered. As introduced in Section 3.3, only the statistics of the output error at each step are meaningful. Firstly, uncertainties transmission during ten-step wind speed prediction are shown in Table 7 and Figure 13, where the KDE and Gaussian methods are adopted to calculate the mean value and confidence intervals (with confidence 0.95) of the prediction error. All the statistical values of the KDE and Gaussian methods show the enlargement effects of receding steps during the ten-step wind speed prediction.  Using the actual wind speed or receding estimated wind speed as input, uncertainties transmission during ten-step wind power prediction are shown in Tables 8 and 9, Figures 14 and 15. For receding wind power using the actual wind speed as input, the input matrix is formed by onestep iteration. When it is input to the WTPC model, the outputs at different step are massively repetitive. As a result, when using the actual wind speed as input, the statistical values of the wind power error at each step remain relatively stable for both the KDE and Gaussian methods. In contrast, when using the receding estimated wind speed as input, the enlargement effects of receding steps during ten-step wind power prediction are significant. Especially, the upper and lower boundaries under confidence 0.95 show this enlargement effect. Mean values are calculated using positive and negative errors.  Table 9. Error statistic of ten-step wind power prediction via Gaussian.  Using the actual wind speed or receding estimated wind speed as input, uncertainties transmission during ten-step wind power prediction are shown in Tables 8 and 9, Figures 14 and 15. For receding wind power using the actual wind speed as input, the input matrix is formed by one-step iteration. When it is input to the WTPC model, the outputs at different step are massively repetitive. As a result, when using the actual wind speed as input, the statistical values of the wind power error at each step remain relatively stable for both the KDE and Gaussian methods. In contrast, when using the receding estimated wind speed as input, the enlargement effects of receding steps during ten-step wind power prediction are significant. Especially, the upper and lower boundaries under confidence 0.95 show this enlargement effect. Mean values are calculated using positive and negative errors.  Table 9. Error statistic of ten-step wind power prediction via Gaussian.   In summary, our simulation results clearly show the uncertainties transmission of receding tenstep wind power prediction from receding ten-step wind speed prediction, where the enlargement effects of receding steps also exist. In particular, wind power values of receding ten-step prediction using receding estimated wind speed as input have greater uncertainties than that using actual wind speed as input. Besides, a new uncertainty evaluation mechanism based on output error statistics is revealed here for receding multi-step predictions.

Conclusions
Wind speed and power prediction is a feasible and economic way to raise our knowledge of wind source and the controllability of wind power generation. Moreover, ultra-short-term wind power prediction is helpful to the refined operation of wind turbines and wind farms. Nowadays, this becomes more and more important. Considering the uncertainties of random factors, joint pointinterval prediction of wind power via a Gaussian process regression (GPR) method is studied in this paper, where a stepwise procedure is adopted considering the sequential uncertainties of wind speed prediction and wind turbine power curve (WTPC) modeling. Testing via GPR, input-output matrix with five-dimensional input and one-dimensional output is determined for ultra-term wind speed prediction. On this basis, a principal component analysis (PCA)-adaptive affinity propagation (ADAP) scheme is validated to partition wind speed patterns with better silhouette coefficients and more uniform data clusters. Then, a multi-model structure for wind speed prediction can be built, but after validation it shows little improvement over single-model prediction via GPR. Subsequently, Prediction error of wind power/(kW) Prediction error of wind power/(kW) Figure 14. Error statistic of ten-step wind power prediction using actual and estimated wind speed via KDE method (Red line: Estimated data; Green dotted line: Actual data).  In summary, our simulation results clearly show the uncertainties transmission of receding tenstep wind power prediction from receding ten-step wind speed prediction, where the enlargement effects of receding steps also exist. In particular, wind power values of receding ten-step prediction using receding estimated wind speed as input have greater uncertainties than that using actual wind speed as input. Besides, a new uncertainty evaluation mechanism based on output error statistics is revealed here for receding multi-step predictions.

Conclusions
Wind speed and power prediction is a feasible and economic way to raise our knowledge of wind source and the controllability of wind power generation. Moreover, ultra-short-term wind power prediction is helpful to the refined operation of wind turbines and wind farms. Nowadays, this becomes more and more important. Considering the uncertainties of random factors, joint pointinterval prediction of wind power via a Gaussian process regression (GPR) method is studied in this paper, where a stepwise procedure is adopted considering the sequential uncertainties of wind speed prediction and wind turbine power curve (WTPC) modeling. Testing via GPR, input-output matrix with five-dimensional input and one-dimensional output is determined for ultra-term wind speed prediction. On this basis, a principal component analysis (PCA)-adaptive affinity propagation (ADAP) scheme is validated to partition wind speed patterns with better silhouette coefficients and more uniform data clusters. Then, a multi-model structure for wind speed prediction can be built, but after validation it shows little improvement over single-model prediction via GPR. Subsequently, Prediction error of wind power/(kW) Prediction error of wind power/(kW) Figure 15. Error statistic of ten-step wind power prediction using actual and estimated wind speed via Gaussian method (Red line: Estimated data; Green dotted line: Actual data).
In summary, our simulation results clearly show the uncertainties transmission of receding ten-step wind power prediction from receding ten-step wind speed prediction, where the enlargement effects of receding steps also exist. In particular, wind power values of receding ten-step prediction using receding estimated wind speed as input have greater uncertainties than that using actual wind speed as input. Besides, a new uncertainty evaluation mechanism based on output error statistics is revealed here for receding multi-step predictions.

Conclusions
Wind speed and power prediction is a feasible and economic way to raise our knowledge of wind source and the controllability of wind power generation. Moreover, ultra-short-term wind power prediction is helpful to the refined operation of wind turbines and wind farms. Nowadays, this becomes more and more important. Considering the uncertainties of random factors, joint point-interval prediction of wind power via a Gaussian process regression (GPR) method is studied in this paper, where a stepwise procedure is adopted considering the sequential uncertainties of wind speed prediction and wind turbine power curve (WTPC) modeling. Testing via GPR, input-output matrix with five-dimensional input and one-dimensional output is determined for ultra-term wind speed prediction. On this basis, a principal component analysis (PCA)-adaptive affinity propagation (ADAP) scheme is validated to partition wind speed patterns with better silhouette coefficients and more uniform data clusters. Then, a multi-model structure for wind speed prediction can be built, but after validation it shows little improvement over single-model prediction via GPR. Subsequently, normalized evaluation indexes for joint point-interval estimation performance are defined. Through testing, a weighted-index (WI) must be used as objective for wind speed prediction by GPR while it is unnecessary for WTPC modeling by GPR, where a particle swarm optimization-differential evolution (PSO-DE) is adopted to guarantee optimization efficiency. Thereafter, the theoretical principle for the sequential uncertainties of wind speed prediction and WTPC modeling is deduced. Using kernel density estimation (KDE) and Gaussian methods to calculate confidence intervals of output error against estimated wind power, they are similar with that of GPR which perfectly validates the effectiveness of the uncertainties transmission principle. Besides, uncertainties transmission by stepwise procedure and uncertainties enlargement by receding steps are also revealed where a new evaluation mechanism based on statistics to output error is revealed using KDE and Gaussian methods. Overall, the above research should prove meaningful for uncertainty prediction of wind power in the ultra-short-term and is very helpful for the future development of wind power generation.