A Hybrid Nonlinear Forecasting Strategy for Short-Term Wind Speed

: The ability to predict wind speeds is very important for the security and stability of wind farms and power system operations. Wind speeds typically vary slowly over time, which makes them di ﬃ cult to forecast. In this study, a hybrid nonlinear estimation approach combining Gaussian process (GP) and unscented Kalman ﬁlter (UKF) is proposed to predict dynamic changes of wind speed and improve forecasting accuracy. The proposed approach can provide both point and interval predictions for wind speed. Firstly, the GP method is established as the nonlinear transition function of a state space model, and the covariance obtained from the GP predictive model is used as the process noise. Secondly, UKF is used to solve the state space model and update the initial prediction of short-term wind speed. The proposed hybrid approach can adjust dynamically in conjunction with the distribution changes. In order to evaluate the performance of the proposed hybrid approach, the persistence model, GP model, autoregressive (AR) model, and AR integrated with Kalman ﬁlter (KF) model are used to predict the results for comparison. Taking two wind farms in China and the National Renewable Energy Laboratory (NREL) database as the experimental data, the results show that the proposed hybrid approach is suitable for wind speed predictions, and that it can increase forecasting accuracy. model, three wind farm datasets with di ﬀ erent wind speed distributions were used. Compared to the persistence model, AR, AR-KF, GP, and GP-EKF, the results showed that the GP-UKF model is capable of predicting short-term wind speeds with a high degree of accuracy. Moreover, it provided the intervals of wind speed data, which was also conducive to reducing the risk of formulating wind farm operation strategies.


Introduction
The 13th annual report by the Global Wind Energy Council states that wind power is leading the charge in the transition away from fossil fuels, and is the most competitively priced technology in many markets [1]. Wind is an important power source, with mature technology. Wind is formed by a mass of air moving from a high-to a low-pressure area in a horizontal direction. The randomness of wind speed causes instability in wind power generation. Voltage fluctuations and off-grid may occur after large-scale wind power integration [2][3][4]. Forecasts 30 min to 6 h ahead of time constitute short-term predictions [5]. Accurate short-term wind speed predictions play an important role in the safety and management optimization of wind power systems. Due to fluctuations in wind speed, accurate predictions are difficult [6,7]. Currently, with point predictions of wind speed, it is impossible to eliminate forecast errors. Hence, uncertainty information (interval prediction) is also important for wind speed predictions, as it indicates the expected upper and lower bounds. It can provide a more scientific reference for unit combination decisions and grid scheduling [8][9][10].
Machine learning methods are a mainstream method for short-term wind speed predictions, and have shown performance [11][12][13][14]. The Gaussian process (GP), as a popular machine learning method, has been successfully applied in wind speed predictions. In [15], a Multi-Task Gaussian  Let's define GP with a mean function and covariance function as follows: where ( ) m x is the mean function and ( , ') κ x x is the covariance function (kernel function).
Then, a joint Gaussian is defined: where ( ) m μ = x , Given a test set * * { , } S = X y , the joint density of the test points is given by * * * * 0, y   Suppose that the wind speed data is x y consists of the original data. In one-step-ahead predictions, represents the output.
Let's define GP with a mean function and covariance function as follows: where ( ) m x is the mean function and ( , ') κ x x is the covariance function (kernel function).
Then, a joint Gaussian is defined: where Given a test set * * { , } S = X y , the joint density of the test points is given by  Suppose that the wind speed data is {x k |k = 1, 2, · · · M, · · · N}. The wind speed training set S = X, y = (x k , y k ) k = 1, 2, · · · M consists of the original data. In one-step-ahead predictions, Let's define GP with a mean function and covariance function as follows: where m(x) is the mean function and κ(x, x ) is the covariance function (kernel function). Then, a joint Gaussian is defined: where µ = m(x), σ = κ(x, x ). Given a test set S = X * , y * , the joint density of the test points is given by where K y = cov[y X] = κ(X, X) + σ 2 I N , K * = κ(X, X * ), and K * * = κ(X * , X * ). The posterior predictive density is p(y * X * , X, y) = N (y * µ * , σ * ), (4) where µ * = K T * K −1 y y, σ * = K * * − K T * K −1 y K * . The GP defines a Gaussian predictive distribution over the output with wind speed mean µ * and wind speed variance σ * [27].
The state-space representation derived from the GP model for wind speed prediction can be expressed as . .
where w k ∼ N (0, Q k ), Q k is the system noise and v k ∼ N (0, R k ), R k is the observation noise.. The GP model has an associated global noise parameter, σ * . The deviation of each predicted point is obtained by the GP model. We make the assumption that the error covariance matrices, Q k and R k , are diagonal and equal to the variance calculated by the GP model.
The UKF uses a deterministic sampling technique (see Figure 3) to pick a minimal set of sample points (called sigma points) around the mean [28]. Let's assume that the (k − 1) th data (x k−1 ) have meanx k−1 and covariance σ k−1 . To calculate the statistics of k th data, we form a matrix χ k−1 of 2p + 1 (p is the dimension of input vector) sigma vectors χ i k−1 (with corresponding weights W i ) as follows: where γ = α 2 (p + λ) is a scaling parameter. α determines the spread of the sigma points around (k − 1) th data, and is usually set to a small positive value (e.g., 1e-3). λ is a secondary scaling parameter which is usually set to 0 or 3-p. β is used to incorporate prior knowledge of the distribution of (k − 1) th data. These sigma vectors are propagated through the GP model, and the mean and covariance for kth data are approximated using a weighted sample mean and covariance of the posterior sigma points, Energies 2020, 13, 1596 where [ ] where The GP model has an associated global noise parameter, * σ . The deviation of each predicted point is obtained by the GP model. We make the assumption that the error covariance matrices, k Q and k R , are diagonal and equal to the variance calculated by the GP model.
The UKF uses a deterministic sampling technique (see Figure 3) to pick a minimal set of sample points (called sigma points) around the mean [28]. Let's assume that the (   Then, the mean and covariance for kth data are expressed as The Kalman K gain is given by

Wind Speed Data Sets
The wind speed data were collected from two onshore wind farms in China, located in Jiangsu and Ningxia provinces. For the two wind farms, the wind speed at heights of 10 m, 50 m, and 70 m was measured by Metmast with a 5-min sampling rate. In this study, hourly mean wind speed data at 70 m from March 1, 2012 to April 30, 2012 were used for the experiment. To compare the performance of the forecasting approach, the wind speed data are randomly divided into four groups. One of the four groups comprises data from 16 days, and the remaining three groups comprise 15 days each. Meanwhile, data for the entire year of 2016 (excluding abnormal measurements) from the National Renewable Energy Laboratory (NREL) database [29] was used for the experiment to verify the validity of the proposed approach. The wind speed data were also randomly divided into four groups (comprising 91 days, 91 days, 91 days, 92 days). Any three of the four groups were used for training and the remaining group for testing. The average results of the four different experiments were used for analysis.
The descriptive statistics of the datasets, including the mean, the standard deviation, and the minimum and the maximum velocities, are shown in Table 1. Overall, the wind speed at the Jiangsu Energies 2020, 13, 1596 6 of 15 wind farm is relatively low and gentle. The wind speed in Ningxia changes dramatically. The mean wind speed in NREL is the lowest.  Figure 4 shows the hourly mean wind speed data for three wind farms. It can be seen that the wind speed is fluctuating.
Energies 2020, 13, x FOR PEER REVIEW 6 of 15 The descriptive statistics of the datasets, including the mean, the standard deviation, and the minimum and the maximum velocities, are shown in Table 1. Overall, the wind speed at the Jiangsu wind farm is relatively low and gentle. The wind speed in Ningxia changes dramatically. The mean wind speed in NREL is the lowest.  Figure 4 shows the hourly mean wind speed data for three wind farms. It can be seen that the wind speed is fluctuating.

Model Identification
Model identification involves determining the input dimension of the model according to the characteristics of the data. The sample autocorrelation (ACF), partial autocorrelation (PACF), and Bayesian information criterion (BIC) function were used to determine the input dimension of the model [30]; the results are shown in the Figure 5. The model with the lowest BIC is the best, and the corresponding input dimension is needed.

Model Identification
Model identification involves determining the input dimension of the model according to the characteristics of the data. The sample autocorrelation (ACF), partial autocorrelation (PACF), and Bayesian information criterion (BIC) function were used to determine the input dimension of the model [30]; the results are shown in the Figure 5. The model with the lowest BIC is the best, and the corresponding input dimension is needed.

Model Identification
Model identification involves determining the input dimension of the model according to the characteristics of the data. The sample autocorrelation (ACF), partial autocorrelation (PACF), and Bayesian information criterion (BIC) function were used to determine the input dimension of the model [30]; the results are shown in the Figure 5. The model with the lowest BIC is the best, and the corresponding input dimension is needed.

Forecasting Performance Evaluation
The models were evaluated synthetically using the following evaluation criteria: (1) root mean square error (RMSE) (3) mean absolute percentage error (MAPE) where t y and ˆt y are the measured and predicted wind speed at time t , and N is the number of test data. The RMSE is used to measure the standard deviation between the predicted and the measured values. The MAE is a measure of the difference between two continuous variables. The MAPE not only considers the error between the predicted and the measured values, but also the ratio between the error and the measured value; the lower the RMSE, MAE, and MAPE values, the better the prediction.
The forecast skill S is a criterion by which to assess the performance of forecasting models over the persistent model. It is a robust metric which describes the improvement over the benchmark model, and is defined as: where RMSE p is the RMSE of the persistent model and RMSE f is the RMSE of a forecasting model.

Forecasting Performance Evaluation
The models were evaluated synthetically using the following evaluation criteria: (1) root mean square error (RMSE) (2) mean absolute error (MAE) (3) mean absolute percentage error (MAPE) where y t andŷ t are the measured and predicted wind speed at time t, and N is the number of test data. The RMSE is used to measure the standard deviation between the predicted and the measured values. The MAE is a measure of the difference between two continuous variables. The MAPE not only considers the error between the predicted and the measured values, but also the ratio between the error and the measured value; the lower the RMSE, MAE, and MAPE values, the better the prediction.
The forecast skill S is a criterion by which to assess the performance of forecasting models over the persistent model. It is a robust metric which describes the improvement over the benchmark model, and is defined as: where RMSE p is the RMSE of the persistent model and RMSE f is the RMSE of a forecasting model.

Persistence Model
The persistence model is a good baseline for time series predictions, and is the simplest short-term model. It used the last observation value as the prediction, i.e., at the time k, the predictionŷ k+i = y k .

AR and AR-KF Approach
The time series AR model proposed by Box and Jenkins is a linear parametric model which has been widely used in wind speed predictions [31,32]. The regular AR model can be expressed as where ϕ 1 , ϕ 2 , · · · ϕ p are regression coefficients, p denotes the AR model input dimension, and w k is white noise. The regression coefficients are obtained by the least square method. The input dimension is determined by BIC. After determining the input of the AR model, the least square method was used to estimate the regression coefficients. The AR-KF method has been proposed for wind speed prediction in the literature [33]. The key to utilizing the KF method is to correctly initialize the state equation and the measurement equation. In this study, the AR model is presented to initialize the state equation for a state space model. Therefore, the AR-KF model for wind speed prediction is formulated as: The statistical characteristics of process noise and measurement noise are w k ∼ N (0, Q k ), v k ∼ N (0, R k ). Based on the above linear state-space model, the KF can be employed to update the state estimation to recursively predict the wind speed data.

GP-EKF Approach
The state-space representation of GP-EKF is the same as the GP-UKF. The Taylor series expansion by GP-EKF is the derivative of the GP mean function (4), which is as follows: where K * = κ(X, X * ) is the vector of kernel values. The partial derivative of the Gaussian kernel function is: where σ is the width parameter of the function. The other processes are same as the standard EKF algorithm [34].

Results and Discussions
At different locations, the distribution of wind speed is different. In order to verify the validity of the proposed approach, three wind speed datasets and five forecasting models were constructed for comparison: the Persistence model, AR, AR-KF, GP, and GP-EKF. For the last four models, the dimension p of the inputs was determined by BIC. As shown in Figure 5, the input dimensions of data from the two wind farms were all selected as 2, and the input dimension of NREL data was selected as 3. All the algorithms were applied in Matlab 2015a. The squared exponential kernel was used as the covariance function of the GP model, and the hyperparameters were estimated by maximum likelihood. Table 2 shows the one-step-ahead wind speed prediction of different models on the Jiangsu dataset. The RMSE, MAE, and MAPE were calculated to compare the performance of each model. For the Jiangsu dataset, the proposed GP-UKF approach has the fewest statistical errors of RMSE and MAE. The nonlinear GP model is superior to the linear AR model on RMSE and MAE. At the lower points, the predictions of the GP model are slightly worse, so the MAPE of the GP model is worse than that of the AR. After updating by filtering, the results of AR-KF, GP-EKF, and GP-UKF were improved. Compared to GP-EKF, GP-UKF applies a more accurate approximations and yields preferable predictions. The persistence model and AR can only provide point estimations of wind speed. However, the GP model can offer the predictive distribution (the covariance) of wind speed data. After filtering, the covariance of the wind speed prediction by the GP model was updated. Both the point and corresponding 95% confidence interval forecasts by GP-UKF on 14/4/2012 are shown in Figure 6. It shows that all the measured data falls within the confidence interval. Furthermore, the R-value and residuals of GP-UKF are shown in Figure 7. The prediction results of GP-UKF have high correlation coefficients with the measured values, and the prediction errors of GP-UKF are close to zero.
Energies 2020, 13, x FOR PEER REVIEW 10 of 15 selected as 3. All the algorithms were applied in Matlab 2015a. The squared exponential kernel was used as the covariance function of the GP model, and the hyperparameters were estimated by maximum likelihood. Table 2 shows the one-step-ahead wind speed prediction of different models on the Jiangsu dataset. The RMSE, MAE, and MAPE were calculated to compare the performance of each model. For the Jiangsu dataset, the proposed GP-UKF approach has the fewest statistical errors of RMSE and MAE. The nonlinear GP model is superior to the linear AR model on RMSE and MAE. At the lower points, the predictions of the GP model are slightly worse, so the MAPE of the GP model is worse than that of the AR. After updating by filtering, the results of AR-KF, GP-EKF, and GP-UKF were improved. Compared to GP-EKF, GP-UKF applies a more accurate approximations and yields preferable predictions.  The persistence model and AR can only provide point estimations of wind speed. However, the GP model can offer the predictive distribution (the covariance) of wind speed data. After filtering, the covariance of the wind speed prediction by the GP model was updated. Both the point and corresponding 95% confidence interval forecasts by GP-UKF on 14/4/2012 are shown in Figure 6. It shows that all the measured data falls within the confidence interval. Furthermore, the R-value and residuals of GP-UKF are shown in Figure 7. The prediction results of GP-UKF have high correlation coefficients with the measured values, and the prediction errors of GP-UKF are close to zero.     Table 3 shows the forecasting results of different forecasting models in Ningxia. Due to the dramatic changes in the wind speed at the Ningxia wind farms, the RMSE and MAE values were relatively higher than those in the Jiangsu wind farms. The average wind speed in Ningxia is higher than that in Jiangsu, and the value of MAPE is correspondingly small. It can be seen that the GP-UKF model yielded the best predictions, with the lowest RMSE, MAE, and MAPE values. The proposed approach can reduce errors caused by variations in distribution. The performance of the GP-UKF model improves significantly. Table 3. One-hour-ahead wind speed forecasting results in Ningxia.

RMSE(m/s) MAE(m/s) MAPE(%) S(%)
Persistence  Figure 8 shows the point and corresponding 95% confidence interval prediction on 17/3/2012 using the GP-UKF model. Even if the wind speed changes dramatically, the interval can almost enclose the wind speed. Figure 9 shows the R-value and residuals of GP-UKF model for the Ningxia dataset.
Energies 2020, 13, x FOR PEER REVIEW 11 of 15 Table 3 shows the forecasting results of different forecasting models in Ningxia. Due to the dramatic changes in the wind speed at the Ningxia wind farms, the RMSE and MAE values were relatively higher than those in the Jiangsu wind farms. The average wind speed in Ningxia is higher than that in Jiangsu, and the value of MAPE is correspondingly small. It can be seen that the GP-UKF model yielded the best predictions, with the lowest RMSE, MAE, and MAPE values. The proposed approach can reduce errors caused by variations in distribution. The performance of the GP-UKF model improves significantly. Table 3. One-hour-ahead wind speed forecasting results in Ningxia.  Figure 8 shows the point and corresponding 95% confidence interval prediction on 17/3/2012 using the GP-UKF model. Even if the wind speed changes dramatically, the interval can almost enclose the wind speed. Figure 9 shows the R-value and residuals of GP-UKF model for the Ningxia dataset.     Table 4 provides the forecasting results of six different predictive models in NREL. Overall, the GP-UKF model outperforms the other models on RMSE, MAE, and MAPE. The MAPE is the ratio between the error and the measured value. The measured wind speed in the NREL dataset is generally small. Therefore, the MAPE is higher than those of the other two datasets. For the NREL data, the nonlinear model GP significantly outperforms the linear model AR. In the NREL dataset, the training data covers a wide range with large samples. Therefore, the predictions of the GP model are better. The improvement of GP-UKF is also significant.   Table 4 provides the forecasting results of six different predictive models in NREL. Overall, the GP-UKF model outperforms the other models on RMSE, MAE, and MAPE. The MAPE is the ratio between the error and the measured value. The measured wind speed in the NREL dataset is generally small. Therefore, the MAPE is higher than those of the other two datasets. For the NREL data, the nonlinear model GP significantly outperforms the linear model AR. In the NREL dataset, the training data covers a wide range with large samples. Therefore, the predictions of the GP model are better. The improvement of GP-UKF is also significant.   Figure 10 demonstrates the point and corresponding 95% confidence interval predictions of wind speed in NREL on 18/5/2016 based on the GP-UKF model. Figure 11 shows the R-value and residuals of the GP-UKF model for the NREL dataset.   Figure 10 demonstrates the point and corresponding 95% confidence interval predictions of wind speed in NREL on 18/5/2016 based on the GP-UKF model. Figure 11 shows the R-value and residuals of the GP-UKF model for the NREL dataset.  In order to intuitively compare the performance, the forecast abilities (Equation (20)) of each model in three wind farms are shown in Figure 12. It is clear that for different forecasting models, nonlinear methods are better than linear ones; for the same wind farm, the forecasting results with GP-UKF are the best. Compared to the persistence model, improvements in the RMSEs of the three wind farms ranged from 12.61%-19.71%. By quantitative comparison, the proposed approach is more effective.

RMSE(m/s) MAE(m/s) MAPE(%) S(%)
In [16], a hybrid GP model was used for 1-hour-ahead wind speed predictions. The improvement in the RMSEs of the three wind farms was about 10%. In [35], a deep learning model was used for 10minute-and 1-hour-ahead wind speed predictions. Compared with the time series model, the improvement in the RMSEs of the two datasets ranged from 11.02%-13.54%. In [36], a hybrid model   Figure 10 demonstrates the point and corresponding 95% confidence interval predictions of wind speed in NREL on 18/5/2016 based on the GP-UKF model. Figure 11 shows the R-value and residuals of the GP-UKF model for the NREL dataset.  In order to intuitively compare the performance, the forecast abilities (Equation (20)) of each model in three wind farms are shown in Figure 12. It is clear that for different forecasting models, nonlinear methods are better than linear ones; for the same wind farm, the forecasting results with GP-UKF are the best. Compared to the persistence model, improvements in the RMSEs of the three wind farms ranged from 12.61%-19.71%. By quantitative comparison, the proposed approach is more effective.
In [16], a hybrid GP model was used for 1-hour-ahead wind speed predictions. The improvement in the RMSEs of the three wind farms was about 10%. In [35], a deep learning model was used for 10minute-and 1-hour-ahead wind speed predictions. Compared with the time series model, the improvement in the RMSEs of the two datasets ranged from 11.02%-13.54%. In [36], a hybrid model In order to intuitively compare the performance, the forecast abilities (Equation (20)) of each model in three wind farms are shown in Figure 12. It is clear that for different forecasting models, nonlinear methods are better than linear ones; for the same wind farm, the forecasting results with GP-UKF are the best. Compared to the persistence model, improvements in the RMSEs of the three wind farms ranged from 12.61%-19.71%. By quantitative comparison, the proposed approach is more effective. Energies 2020, 13, x FOR PEER REVIEW 13 of 15 Figure 12. Forecast abilities of different forecasting models in three wind farms for 1h-ahead predictions. Figure 13 shows the RMSEs of persistence, GP, and the proposed approach at different forecast horizons at three wind farms. The proposed hybrid approach always performed better than the persistence and GP models; the improvement of the GP-UKF model over the GP and persistence models was between 4.75%-23.93% for 4 to 6-hour forecast horizons.

Conclusions
This study proposed a hybrid approach GP-UKF for short-term wind speed prediction. The GP model was first used for short-term wind speed predictions. Due to the change of wind speed distribution, the initial predictions needed to be adjusted dynamically. Then, the nonlinear statespace model solved by UKF was used to update the initial prediction. In order to verify the proposed GP-UKF model, three wind farm datasets with different wind speed distributions were used. Compared to the persistence model, AR, AR-KF, GP, and GP-EKF, the results showed that the GP-UKF model is capable of predicting short-term wind speeds with a high degree of accuracy. Moreover, it provided the intervals of wind speed data, which was also conducive to reducing the risk of formulating wind farm operation strategies.  In [16], a hybrid GP model was used for 1-h-ahead wind speed predictions. The improvement in the RMSEs of the three wind farms was about 10%. In [35], a deep learning model was used for 10-minand 1-h-ahead wind speed predictions. Compared with the time series model, the improvement in the RMSEs of the two datasets ranged from 11.02%-13.54%. In [36], a hybrid model based on decomposition and deep learning was used for 15-min-and 1-h-ahead wind speed predictions. The deep learning model produced smaller errors than the other single models (persistence, MLR and SVR), and the improvement of the RMSE compared to the persistence model was about 8%. Combined with a decomposition strategy, the errors were significantly reduced. Compared with prior works, it can be seen that the proposed approach GP-UKF is competitive. Figure 13 shows the RMSEs of persistence, GP, and the proposed approach at different forecast horizons at three wind farms. The proposed hybrid approach always performed better than the persistence and GP models; the improvement of the GP-UKF model over the GP and persistence models was between 4.75%-23.93% for 4 to 6-h forecast horizons.  Figure 13 shows the RMSEs of persistence, GP, and the proposed approach at different forecast horizons at three wind farms. The proposed hybrid approach always performed better than the persistence and GP models; the improvement of the GP-UKF model over the GP and persistence models was between 4.75%-23.93% for 4 to 6-hour forecast horizons.

Conclusions
This study proposed a hybrid approach GP-UKF for short-term wind speed prediction. The GP model was first used for short-term wind speed predictions. Due to the change of wind speed distribution, the initial predictions needed to be adjusted dynamically. Then, the nonlinear statespace model solved by UKF was used to update the initial prediction. In order to verify the proposed GP-UKF model, three wind farm datasets with different wind speed distributions were used. Compared to the persistence model, AR, AR-KF, GP, and GP-EKF, the results showed that the GP-UKF model is capable of predicting short-term wind speeds with a high degree of accuracy. Moreover, it provided the intervals of wind speed data, which was also conducive to reducing the risk of formulating wind farm operation strategies.

Conclusions
This study proposed a hybrid approach GP-UKF for short-term wind speed prediction. The GP model was first used for short-term wind speed predictions. Due to the change of wind speed distribution, the initial predictions needed to be adjusted dynamically. Then, the nonlinear state-space model solved by UKF was used to update the initial prediction. In order to verify the proposed GP-UKF model, three wind farm datasets with different wind speed distributions were used. Compared to the persistence model, AR, AR-KF, GP, and GP-EKF, the results showed that the GP-UKF model is capable of predicting short-term wind speeds with a high degree of accuracy. Moreover, it provided the intervals of wind speed data, which was also conducive to reducing the risk of formulating wind farm operation strategies.
Author Contributions: Methodology, X.Z. and C.L.; validation, X.Z.; writing-original draft preparation, X.Z.; writing-review and editing, H.W.; funding acquisition, H.W. and K.Z. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.