Deterministic and Interval Wind Speed Prediction Method in O ﬀ shore Wind Farm Considering the Randomness of Wind

: In order to improve the prediction accuracy of wind speed, this paper proposes a hybrid wind speed prediction (WSP) method considering the ﬂuctuation, randomness and nonlinear of wind, which can be applied to short-term deterministic and interval prediction. Variational mode decomposition (VMD) decomposes wind speed time series into nonlinear series Intrinsic mode function 1 (IMF1), stationary time series IMF2 and error sreies (ER). Principal component analysis-Radial basis function (PCA-RBF) model is used to model the nonlinear series IMF1, where PCA is applied to reduce the redundant information. Long short-term memory (LSTM) is used to establish a stationary time series model for IMF2, which can better describe the ﬂuctuation trend of wind speed; mixture Gaussian process regression (MGPR) is used to predict ER to obtain deterministic and interval prediction results simultaneously. Finally, above methods are reconstructed to form VMD-PRBF-LSTM-MGPR which is the abbreviation of hybrid model to obtain the ﬁnal results of WSP, which can better reﬂect the volatility of wind speed. Nine comparison models are built to verify the availability of the hybrid model. The mean absolute percentage error (MAE) and mean square error (MSE) of deterministic WSP of the proposed model are only 0.0713 and 0.3158 respectively, which are signiﬁcantly smaller than the prediction results of comparison models. In addition, conﬁdence intervals (CIs) and prediction interval (PIs) are compared in this paper. The experimental results show that both of them can quantify and represent forecast uncertainty and the PIs is wider than the corresponding CIs.


Introduction
Renewable sources of energy are gradually replacing tradition sources in many countries, not only because of the limited of fossil and nuclear fuel energy sources but unprecedented and irreversible damage to the environment caused by them, as well as the continuing reduction in the cost of renewable technologies. Wind energy has been growing rapidly in recent years. In 2018, the global installed capacity of offshore wind power increased by 4.3 GW with a total installed capacity of 23 GW. It has grown by nearly 30% a year since 2010 [1]. According to a new report by Ember, a European think tank on climate and energy, wind and solar power supply 10% of the global electricity demand in the first half of 2020, double the level of 2015.
Compared with other major power generation, wind has nonlinear characteristics [2], and reliable wind speed prediction (WSP) can reduce the impact of these characteristics on the power system [3,4]. Reference [5] study the existence of low-dimensional deterministic chaos in wind times series through

•
Physical modes, which mainly use numerical weather prediction (NWP) data to complete wind speed prediction by establishing variable ratio expressions of wind speed and air pressure, air density, air humidity, etc. [13]. • Statistical models, which mainly use time series modeling, including autoregressive integrated moving average model (ARIMA) and Kalman filtering, etc. [14]. • Temporal-spatial correlation models, which mainly use data from different measuring points to predict wind speed [15]. • Artificial intelligence models, which are hot spots for WSP, such as recurrent neural network (RNN), support vector machine (SVM), and fuzzy logic method, etc. [16].
Most scholars choose one or more of above methods for prediction, but studies show that the single statistical method is not accuracy enough in wind speed prediction. Therefore, the academia proposes a hybrid model based on statistical methods of which prediction accuracy is better than the single model [17]. In reference [18], the effective information is extracted by CNN sequence feature extraction ability, and the data is input to LSTM network after other information is removed. Reference [19] proposes using momentum item to speed up the convergence rate of BP, and taking the advantage of the genetic algorithm (GA) to optimize the structure, weights and bias of BP. Reference [20] performs grey correlation analysis on other meteorological factors such as temperature to achieve dimension reduction, then build LSTM model based on that. Reference [21] uses EMD to depose wind speed series into different IMFs, and build LSTM model, which has high prediction accuracy. In reference [22], EMD and ensemble empirical mode distribution (EEMD) are used for decomposition of wind speed data into IMFs, and an ANN is proposed to predict the wind speed with IMFs given as the inputs. Reference [23] proposes a WSP method based on deep learning combined by CNN and RNN using the big data of a certain wind form from 2014 to 2015.
Considering the characteristics of randomness, volatility, low energy density and non-linearity and above analysis, wind speed should be decomposed into several components and studied separately. At present, wavelet transform (WT), EMD methods and their modified versions such as complementary EEMD, empirical wavelet transform (EWT), EEMD, and other signal decomposition methods have been widely used [24]. Variational mode decomposition (VMD) was developed by Dragomiretskiy and Zosso, which is a novel type of complex signal decomposition on the basis of EMD [25].
Energies 2020, 13, 5595 3 of 23 VMD is a completely intrinsic, self-adaptive and non-recursive decomposition technique, by which a signal can decomposed into a finite broadband mode set and show the partial feature of the signal. It has significant advantages in solving the problem of signal noise and avoiding mode overlap [25]. Wind speed series can be decomposed by VMD algorithm into intrinsic mode functions (IMFs), which can effectively reflect the characteristics of randomness and fluctuation. In addition, wind speed is affected by meteorological factors, such as wind direction, temperature, humidity, air pressure and other parameters. Principal component analysis algorithm (PCA) can effectively remove redundant data and realize data dimension reduction. In order to ensure the reliability of the data, the number of data samples involved in this paper is 350, which belongs to small sample data. It is more suitable to solve the problem by means of mechanical learning, such as LSTM. On the contrary, deep learning is more suitable for the case of large quantity, because more parameters and data quantity are needed to fully train the deep learning model. If the data quantity is small, the parameters cannot be fully trained, and the accuracy of the deep learning model is not as good as machine learning. This paper mainly includes two parts, deterministic prediction and interval prediction. For deterministic prediction part, RBF neural network model is used to model IMF with obvious nonlinear characteristic in this paper, because RBF has the advantages of infinite approximation to nonlinear functions and overcoming local optimal solution [26]. LSTM is used to model and fit the mode with small fluctuation and non-obvious nonlinear characteristics. Long short term memory network (LSTM) extended from RNN network is a special algorithm for processing time-series problem, which solves the problem of gradient explosion or disappearance of RNN and can learn long-term dependent information and is suitable for time series classification and prediction [17]. The noise sequence reflects the fluctuation and randomness of wind speed. In this paper, mixture Gaussian process regression (MGPR) model is used to model the noise sequence and the prediction results of error sequence are obtained. Finally, the final wind speed prediction results are obtained by summing and reconstructing the initial prediction results of each mode and the error sequence. As for interval prediction, confidence interval (CIs) and prediction interval (PIs) are two well-know methods for quantifying and representing the uncertainty of predictions [27]. A quantitative comparison is made between CIs and PIs of wind speed under a certain confidence degree, where CIs are obtained by a single model MGPR and the hybrid model proposed in this paper respectively and PIs are obtained by the hybrid model combining the lower and upper bound estimation method (LUBE).
In this paper, a new hybrid model is constructed by combining a variety of methods to predict wind speed, and the proposed model in this paper can be applied for short-term deterministic and interval WSP. Deterministic prediction can help optimize power system scheduling and improve the reliability of wind power grid connection. Interval prediction can quantify the volatility and intermittent of wind power, and improve the security and economy of wind grid. The primary innovations are as follows.
Firstly, VMD is used to decompose the original wind speed sequence into different IMFs, so as to achieve de-noising while retaining the original information and enhance the predictability of wind speed. Additionally, the comparison between EMD and VMD shows VMD is more suitable for this paper. Secondly, as for prediction model with multiple input variables, PCA is applied to data dimension reduction to remove redundant information. The experimental results show that the prediction results with PCA is superior to those without PCA. Thirdly, corresponding reasonable model is established for each IMF obtained from VMD algorithm to improve prediction accuracy. Fourthly, the hybrid model VMD-PRBF-LSTM-MGPR which is obtained by reconstructing models for each mode can be applied to deterministic prediction and interval prediction simultaneously, where deterministic prediction has high accuracy and can represent the fluctuation of wind speed. The mean absolute percentage error (MAE), relative mean square error (RMSE) and mean square error (MSE) of deterministic prediction of the hybrid model proposed in this paper are only 0.0713 and 0.3158 respectively, which are more accurate than the prediction results of comparison models. In addition, CIs and PIs are compared in this paper. The results show that the prediction interval is wider than the corresponding confidence interval. This paper is divided into the five parts. The second section describes the basic introduction of methods used in this paper. The third section mainly introduces the process of WSP including deterministic and interval prediction, which consists of three parts, data processing, establishment of comparison models and hybrid model, and validation of the proposed model. The fourth section shows the WSP results and discussion through a numeral case, followed by conclusion.

Variational Mode Decomposition
Variational modal decomposition (VMD) is proposed by Desomimicskiy and Zosso on the basis of EMD, and is an adaptive, quasi-orthogonal and completely non-recursive decomposition method based on Hilbert transform, classical Weiner filtering and mixed frequency variational problem. EMD is a classical mode decomposition method proposed by Huang et al., which is lack of mathematical theory, ability to properly cope with noise and the hard band-limits of wavelet approaches [25]. Compared with EMD, VMD decomposes the signal into finite bandwidth with different center frequency according to the preset model number K. VMD uses alternating direction multiplier method (ADMM) to constantly update each mode and its center frequency, and gradually adjusts each mode to the corresponding base band, then exacts each mode and its corresponding center frequency, and eventually obtains each component with no central frequency.
The essence of the VMD is the variational problem, which is divided into the construction and solution of the variational problems, and the process is as follows.
Hilbert transform was used to calculate the correlation analysis signal of each mode u k , and the unilateral frequency spectrum was obtained.
For each mode u k , the frequency spectrum of the mode is shifted to base-band by tuning to the exponential mixing of the respective estimated center frequency.
The bandwidth is estimated by the H 1 Gaussian smoothness of demodulation signal. The resulting constrained variational problem is the following: where, * denotes convolution, u k represents the k-th mode,w k represents the center frequency, K represents the total number of modes, ∂ t represents the Dirac distribution, f (t) is the original signal.
In order to eliminate the constraints of the above problems, quadratic penalty term and Lagrangian multiplier are introduced to transform the constrained optimization into the unconstrained optimization problem: where, α is the penalty parameter. λ is the Lagrange multiplier and ⊗ represents the convolution operator. The solution of the original minimization problem (1) is transformed into an optional model, which can be solved by multipliers alternating direction method. Update u k and w k according to ADMM. The expression is as follows: Minimization u k and w k : Energies 2020, 13, 5595 where, N represents the number of iterations.

PCA-RBF Model
Principal component analysis is to reflect data characteristics and rules with as little information as possible. It not only retains the information of the original data, but also achieves data dimensional-reduction and reduces the complexity of the data itself [28,29]. The input variables of RBF include six variables, namely wind speed (V), wind direction (WD), temperature (T), humidity (H), air pressure (P) and battery energy (B) respectively in this paper. In order to achieve dimensional-reduction and reduce the redundant data, PCA is selected in this paper.
Firstly, data are collected to determine the original data input matrix X and output variables Y = (y 1 , y 2 , . . . , y n ) of RBF neural network.
where, n represents the number of samples, h represents the number of input data. Calculate the mean value and standard deviation of each variable, which can be expressed as follows: x ij )/n, ( j = 1, 2, · · · , h) (6) The mean value and standard deviation are used to standardize the original data to eliminate the dimensional influence, which can be expressed as: The co-variance matrix R is obtained from the data after standardized processing, which can be expressed as: Secondly, calculate the cumulative variance contribution rate.
where, λ j represents the j-th eigenvalue. a(m) represents the cumulative contribution rate of the first m principal components, and h represents the number of original variables. Extract principal components information, which can be expressed as: where, Y is the principal component part of the original variable matrix X. l represents the eigenvectors corresponding to the eigenvalues. The first m principal components whose eigenvalue is greater than 1 and whose cumulative contribution rate is greater than 90% are selected, while the rest components are discarded. Finally, the PCA-RBF model is established, and the principal components matrix Y is used as the input matrix to build a new Radial Basis Function (RBF) model. Because RBF Network has strong nonlinear fitting ability and can approximate arbitrary nonlinear functions. It has good generalization ability and can map any complex nonlinear relationship. The structure of RBF neural network is shown in Figure 1. RBF neural network is not repeated in this paper, detailed references [22].
Energies 2020, 13, x FOR PEER REVIEW 6 of 24 where, j λ represents the j-th eigenvalue. a(m) represents the cumulative contribution rate of the first m principal components, and h represents the number of original variables. Extract principal components information, which can be expressed as: where, Y is the principal component part of the original variable matrix X. l represents the eigenvectors corresponding to the eigenvalues. The first m principal components whose eigenvalue is greater than 1 and whose cumulative contribution rate is greater than 90% are selected, while the rest components are discarded. Finally, the PCA-RBF model is established, and the principal components matrix Y is used as the input matrix to build a new Radial Basis Function (RBF) model. Because RBF Network has strong nonlinear fitting ability and can approximate arbitrary nonlinear functions. It has good generalization ability and can map any complex nonlinear relationship. The structure of RBF neural network is shown in Figure 1. RBF neural network is not repeated in this paper, detailed references [22].

LSTM Model
Long-short term memory (LSTM) network is a special recurrent structure which can improve the ability of RNN network to handle long-term dependent tasks and resists the problem of vanishing gradient disappearance [17]. Compared with the traditional neural network, a feedback connection is added in the hidden layer of RNN. The interconnection of nodes between hidden layers in RNN makes the output of the hidden layer enter into the hidden layer of output and the next time step simultaneously, as shown in Figure 2. The network can generate the memory state of the past data, and establish the dependency relationship between the data of different time periods through this structure, so it can better deal with the time series problems [30]. LSTM solves the problem that the back propagation across time steps may lead to vanishing gradient problem of RNN when handle long-term dependent tasks by introducing a memory unit instead of the implicit node of the traditional RNN. The memory units of LSTM in hidden layer are shown in Figure 3. The repetition module of LSTM consists of three sigmoid functions and one tanh function, and introduces threshold structure, inputting gate, forgetting gate and outputting gate, so that the LSTM can selectively remember new information or delete information.

LSTM Model
Long-short term memory (LSTM) network is a special recurrent structure which can improve the ability of RNN network to handle long-term dependent tasks and resists the problem of vanishing gradient disappearance [17]. Compared with the traditional neural network, a feedback connection is added in the hidden layer of RNN. The interconnection of nodes between hidden layers in RNN makes the output of the hidden layer enter into the hidden layer of output and the next time step simultaneously, as shown in Figure 2. The network can generate the memory state of the past data, and establish the dependency relationship between the data of different time periods through this structure, so it can better deal with the time series problems [30]. LSTM solves the problem that the back propagation across time steps may lead to vanishing gradient problem of RNN when handle long-term dependent tasks by introducing a memory unit instead of the implicit node of the traditional RNN. The memory units of LSTM in hidden layer are shown in Figure 3. The repetition module of LSTM consists of three sigmoid functions and one tanh function, and introduces threshold structure, inputting gate, forgetting gate and outputting gate, so that the LSTM can selectively remember new information or delete information.  The process of information processing about LSTM model can be expressed by the following equation respectively: where, t i is the input gate, which determines the updated information within the unit. t f is the forget gate, that determines the deleted information in the unit. t o is the output gate, that determines how much information is output. t c  is the candidate value of the element state at time t. t c is the ⋯ Figure 3. The hidden layer of long short-term-memory (LSTM) cell structure.
The process of information processing about LSTM model can be expressed by the following equation respectively: where, i t is the input gate, which determines the updated information within the unit. f t is the forget gate, that determines the deleted information in the unit. o t is the output gate, that determines how much information is output. ∼ c t is the candidate value of the element state at time t. c t is the state of the unit at time t, calculated by the combination of parameters i t , ∼ c t , f t , c t−1 through Element-wise multiplication (*). h t is the output value filtered by the output gate. σ represents the Sigmoid function in the range of 0 to 1, and uses the tanh function to trust the value between −1 and 1. x t represents the input of memory unit at time t.
The flow chart of LSTM model for wind speed prediction is shown in Figure 4.

Mixture Gaussian Process Regression (MGPR)
Gaussian process regression model (GPR) is a non-parametric probability model based on nuclear [31], and MGPR is based on the mixed Gaussian process expert support GP approaching the temptation of formalism, as shown in Figure 5. Each expert uses a set of guidance to supplement, and according to the proximity of data points to the experts, and the data point are defined and assigned to the experts probability. i c f c  through Elementwise multiplication (*). t h is the output value filtered by the output gate. σ represents the Sigmoid function in the range of 0 to 1, and uses the tanh function to trust the value between −1 and 1. t x represents the input of memory unit at time t. , , , , , , , , , , The flow chart of LSTM model for wind speed prediction is shown in Figure 4.

Mixture Gaussian Process Regression (MGPR)
Gaussian process regression model (GPR) is a non-parametric probability model based on nuclear [31], and MGPR is based on the mixed Gaussian process expert support GP approaching the temptation of formalism, as shown in Figure 5. Each expert uses a set of guidance to supplement, and according to the proximity of data points to the experts, and the data point are defined and assigned to the experts probability. Set the input eigenvector be x   and the actual output signal be Defined function space ( ) k f x conforms to Gaussian distribution, and each independent expert model contains a potential function, which is expressed as:

Mixture Gaussian Process Regression (MGPR)
Gaussian process regression model (GPR) is a non-parametric probability model based on nuclear [31], and MGPR is based on the mixed Gaussian process expert support GP approaching the temptation of formalism, as shown in Figure 5. Each expert uses a set of guidance to supplement, and according to the proximity of data points to the experts, and the data point are defined and assigned to the experts probability. Set the input eigenvector be x   and the actual output signal be Defined function space ( ) k f x conforms to Gaussian distribution, and each independent expert model contains a potential function, which is expressed as: Training data set  Set the input eigenvector be X = {x n } N n−1 and the actual output signal be Y = y n N n=1 . Defined function space f k (x) conforms to Gaussian distribution, and each independent expert model contains a potential function, which is expressed as: where, GP(·) is gaussian distribution function, k(x, x ; θ k ) is the kernel function of expert K and θ k is hyper parameter. Each expert is supplemented by a set of M induction inputs U k = {u k1 , · · · , u kM }, whose inducement points are expressed as: Suppose that each observation x n , y n are associated with only one expert identified by z n , denoted by z, g, f , U, θ, σ for z k , g k , f k , U k , θ k , σ k . The total joint distribution of the model is expressed as: where, Energies 2020, 13, 5595 where, where, N(; , ) represents Gaussian distribution. K(X k , U k ) is the co-variance matrix evaluated at all points in X k and U k . And the results of interval WSP in the sense of probability distribution can be obtained. In addition, the subscript k of the co-variance matrix is specified by the super-parameter of expert K. Each mean m k is called the center of expert K corresponding to the output of point prediction.

Data Preprocessing
The research data of this paper are from the offshore small wind turbine test station of Shantou University located on Nan'ao Island, Guangdong Province. Nan'ao Island is the largest island wind farm traversed by the Tropic of Cancer with obvious marine climate characteristics in Asia and abound in wind resources with average wind speed of 8.54 m/s. We selected samples collected for three consecutive days in September 2017 (1 September 2017-3 September 2017), and the number of sample is 350, and the sampling time interval is 10 min. The first 300 samples of all samples is taken as the training set, and the remaining 50 samples is taken as the test set. The method of this paper is applicable to short term WSP. The sampled variables include six variables, namely wind speed (V), wind direction (WD), temperature (T), humidity (H), air pressure (P) and battery energy (B) respectively. Figure 6. shows the wind speed series distribution. As we can see from Figure 6, the strong stochastic characteristic of wind speed is observed, with daily average wind speeds values changing from 1 to 15 m/s, where Y axis represents the wind speed and X axis represents the sample with an interval of 10 min.
There are correlations between sampled variables, but multidimensional data may lead to information redundancy and increase the computational complexity. The main objective is to find how the additional sampled variables can help improving wind speed prediction. Therefore, PCA is used to eliminate redundant information between input parameters, and replace the original input parameters with new uncorrelated input parameters, so as to reduce the dimension of the original data on the premise of retaining the original information. PCA algorithm is used to obtain the principal sample is 350, and the sampling time interval is 10 min. The first 300 samples of all samples is taken as the training set, and the remaining 50 samples is taken as the test set. The method of this paper is applicable to short term WSP. The sampled variables include six variables, namely wind speed (V), wind direction (WD), temperature (T), humidity (H), air pressure (P) and battery energy (B) respectively. Figure 6. shows the wind speed series distribution. As we can see from Figure 6, the strong stochastic characteristic of wind speed is observed, with daily average wind speeds values changing from 1 to 15 m/s, where Y axis represents the wind speed and X axis represents the sample with an interval of 10 min. There are correlations between sampled variables, but multidimensional data may lead to information redundancy and increase the computational complexity. The main objective is to find how the additional sampled variables can help improving wind speed prediction. Therefore, PCA is used to eliminate redundant information between input parameters, and replace the original input parameters with new uncorrelated input parameters, so as to reduce the dimension of the original data on the premise of retaining the original information. PCA algorithm is used to obtain the principal component scores of each variable, and the standard orthogonal principal component score coefficient of each variable is visualized, as shown in the Figure 7. The eigenvalue of each principal component and the cumulative contribution rate of the sample data are shown in Table 1. As we can see from Table 1, the eigenvalues of the first two principal components of the sample data are both greater than 1, but the cumulative contribution rate is only 68%. The eigenvalues of the third and fourth principal components are close to 1, and the cumulative contribution rate of the first four principal components reaches 93.3%, therefore the first four principal components are selected as the model input data.  The eigenvalue of each principal component and the cumulative contribution rate of the sample data are shown in Table 1. As we can see from Table 1, the eigenvalues of the first two principal components of the sample data are both greater than 1, but the cumulative contribution rate is only 68%. The eigenvalues of the third and fourth principal components are close to 1, and the cumulative contribution rate of the first four principal components reaches 93.3%, therefore the first four principal components are selected as the model input data.

Decomposition of Wind Speed Data
Variational mode decomposition (VMD) is used to extract the wind speed series into different intrinsic mode functions. As shown in Figure 8, it is the original wind speed series results decomposed by Empirical mode decomposition (EMD) and VMD respectively, which include decomposition modes and corresponding spectrum. If the number of modes K is too small, it will lead to undersegmentation of the sample data, and some components are contained in other modes or discarded as "noise". On the contrary, if K is too big, it will lead to capture extra noise or mode to mode duplication. As shown in Figure 8b, EMD can not preset K, which is not convenient to observe the fluctuation of wind speed. Optimization of VMD is on the basis of the EMD algorithm, which can preset K, as well as can obverse the fluctuation and randomness of wind speed. K is 3 in this paper. Therefore, the mode is extracted by denoising VMD algorithm in this paper.
Energies 2020, 13, x FOR PEER REVIEW 12 of 24 considered as a noise series. The ADF in Matlab is used to test whether the time series is stable. The return value of IMF1 is 0 but IMF2 and ER series return a value of 1. Thus, according to the ADF test and the characteristic analysis of the decomposed series, as can be seen that, IMF1 is non-stationary series, IMF2 and ER are all stationary series. VMD algorithm decomposes the original wind speed series into nonlinear part, stationary time series part and noise part, and corresponding reasonable models are built for each part, which can reduce the impact of randomness and fluctuation.

The Process of Modeling
The WSP modeling process mainly consists of three parts: data pre-processing, establishment of hybrid models and comparison models including deterministic prediction and interval prediction, and validation of the proposed model. Finally, a numeral case is given to introduce these three parts in details respectively. The specific modeling process is shown in Figure 10, where the green part  Figure 9 shows the 3D visualization of VMD decomposition modes. As we can see from Figure 9, IMF1 has strong volatility and obvious nonlinear characteristics. IMF2 has a weak fluctuation range from −3 to 3, and is randomly distributed around 0 without obvious nonlinear characteristic and the fluctuation range is small. IMF3 is randomly distributed around [−1.5, 1.5] and scatter points fluctuate around 0 randomly, and they do not have distinct trend and fluctuation. Therefore, IMF3 can be considered as a noise series. The ADF in Matlab is used to test whether the time series is stable. The return value of IMF1 is 0 but IMF2 and ER series return a value of 1. Thus, according to the ADF test and the characteristic analysis of the decomposed series, as can be seen that, IMF1 is non-stationary series, IMF2 and ER are all stationary series. VMD algorithm decomposes the original wind speed series into nonlinear part, stationary time series part and noise part, and corresponding reasonable models are built for each part, which can reduce the impact of randomness and fluctuation.

The Process of Modeling
The WSP modeling process mainly consists of three parts: data pre-processing, establishment of hybrid models and comparison models including deterministic prediction and interval prediction, and validation of the proposed model. Finally, a numeral case is given to introduce these three parts in details respectively. The specific modeling process is shown in Figure 10, where the green part shows the entire process of the hybrid model proposed in this paper and the dotted lines are connected to the comparison models flow.

The Process of Modeling
The WSP modeling process mainly consists of three parts: data pre-processing, establishment of hybrid models and comparison models including deterministic prediction and interval prediction, and validation of the proposed model. Finally, a numeral case is given to introduce these three parts in details respectively. The specific modeling process is shown in Figure 10, where the green part shows the entire process of the hybrid model proposed in this paper and the dotted lines are connected to the comparison models flow. 1. Data processing section.
First, PCA is used to extract the principal components of multidimensional variables (like wind speed (V), wind direction (WD), temperature (T), humidity (H) and air pressure (P)), which the cumulative contribution rate is greater than 90%. The partial auto-correlation function is used in time series to determine the state variables and construct the appropriate training set. Figure 11 shows the auto-correlation function and partial correlation function of the original time series. As we can see

Data processing section.
First, PCA is used to extract the principal components of multidimensional variables (like wind speed (V), wind direction (WD), temperature (T), humidity (H) and air pressure (P)), which the cumulative Energies 2020, 13, 5595 13 of 23 contribution rate is greater than 90%. The partial auto-correlation function is used in time series to determine the state variables and construct the appropriate training set. Figure 11 shows the auto-correlation function and partial correlation function of the original time series. As we can see from Figure 11, the auto-correlation function has the trailing feature, while the partial auto-correlation function truncates the tail, so that the wind speed series satisfies the autoregression (AR) model. 1. Data processing section.
First, PCA is used to extract the principal components of multidimensional variables (like wind speed (V), wind direction (WD), temperature (T), humidity (H) and air pressure (P)), which the cumulative contribution rate is greater than 90%. The partial auto-correlation function is used in time series to determine the state variables and construct the appropriate training set. Figure 11 shows the auto-correlation function and partial correlation function of the original time series. As we can see from Figure 11, the auto-correlation function has the trailing feature, while the partial autocorrelation function truncates the tail, so that the wind speed series satisfies the autoregression (AR) model. Figure 11. The auto-correlation function and partial correlation function of wind speed series. Figure 11. The auto-correlation function and partial correlation function of wind speed series.
Secondly, VMD decomposition is used to extract the original wind speed series into IMFs and error series. A nonlinear model is established for IMF1 with strong volatility and obvious nonlinear characteristics. IMF2 is a stationary time series, and a linear time series prediction model is established for IMF2. MGPR is used to predict ER series.
First of all, according to the previous analysis, the RBF nonlinear model is established for IMF1 obtained by VMD with obvious nonlinear characteristic, where the input of RBF model is the principal component matrix obtained by PCA. Combined with PCA, this model is referred to as VMD-PCA-RBF, and the prediction result is v pi1 .
As for IMF2, a stationary time series model is established by LSTM, which is named as VMD-LSTM and the prediction result is v pi2 . So, we can obtain the deterministic prediction result of hybrid model VMD-PRBF-LSTM v pi , which is expressed as follows: The ER series satisfies Gaussian distribution according to the JARque-Bera test in Matlab, and the error distribution is shown in Figure 12 where, ∧ v pi is the CIs under the confidence degree 100(1 − α)%.
Energies 2020, 13, 5595 14 of 23 where, pi v  is the CIs under the confidence degree 100 1 ( % ) α  . Based on the deterministic prediction of VMD-PCA-RBF-LSTM-MGPR model, an interval prediction model is made by using the lower upper estimation method [16]. The wind speed PIs is obtained under the confidence degree 100(1 − α)%, where α stands for significance level. Which can be expressed as: where L α t (v pi ) means the lower limit of PIs and CIs, U α t (v pi ) means the upper limit. Secondly, establish comparison models. In this paper, a total of nine comparison models are established. RBF and BP neural network models are built respectively with taking wind speed (V) and other variables as input data. Only use PCA in data processing to obtain the input matrix of RBF, so as to establish PCA-RBF model. As for each mode obtained by VMD, establish RBF model and then build VMD-RBF model by reconstituting three RBF. Similarly, LSTM, ESN and VMD-LSTM comparison models are established respectively, where all three models have only one input, wind speed. Meanwhile we build two other comparison models, EMD-RBF and EMD-LSTM. In addition, a single model MGPR is used for deterministic WSP and interval WSP, so the MGPR comparison model is obtained.

3.
The validity of the model is illustrated in detail in Section 4.

Evaluation of Prediction Results
For deterministic prediction, this paper takes mean absolute error (MAE), mean square error (MSE), relative mean square error (RMSE) and mean absolute percent error (MAPE) as evaluation indexes of the prediction model. The larger the evaluation indexes value are, the larger the prediction error is. They are expressed as follows respectively. where, n represents the number of samples. v ri represents the actual wind speed. v pi represents the prediction wind speed. The PIs and CIs indicators include mean absolute percentage error I MAPE , prediction interval coverage percentage I PICP , mean prediction interval width I MPIW . The smaller the I MAPE and I MPIW values, the higher the forecasting accuracy. The larger I PICP value, the more actual value falling into the interval, that represents the reliability of interval prediction is higher.
Mean absolute percentage error I MAPE is used to evaluate the error between true and predicted value. Its expression is as follows.
Mean prediction interval width I MPIW is used to test the suitability of the width of CIs and PIs, which used to prevent excessive interval width. Its expression is as follows.
where, U(v pi ) represents the upper limit of the prediction interval. L(v pi ) denotes the lower limit of the prediction interval. Prediction interval coverage percentage I PICP is used to express the probability of the actual value in the prediction interval. Its expression is as follows:

Analysis of Deterministic Prediction Results
In order to verify the availability of the model, a total of nine comparison models are proposed in this paper. Figures 13-16 respectively show the deterministic prediction results of different WSP models, where Y axis represents the wind speed and X axis represents prediction samples with an interval of 10 min. The error evaluation indexes are listed in Table 2. Figure 13 shows the deterministic prediction results of Back Propagation model (BP), Radial Basis Function moedl (RBF), Principal component analysis-Radial Basis Function model (PCA-RBF) and Variational mode decomposition-Radial Basis Function moedl (VMD-RBF) respectively. According to Table 2 and Figure 13, the prediction performance of RBF model is better compared with the single BP model, where all of the evaluation index of RBF is lower than BP. The MAE of VMD-RBF is 2.6233, which is 10.7243 lower than BP model. The prediction results of VMD-RBF and PCA-RBF have some improvement effect but compared with the single RBF model. It proves PCA and VMD can help improve the accuracy of WSP. The prediction results of RBF, VMD-RBF and PCA-RBF can only reflect the volatility trend of some actual values, however, there is a big difference between the prediction values and the actual values. In addition, VMD-RBF and PCA-RBF both have stronger volatility. The VMD-RBF model improves the predicted value of RBF model to some extent, but it does not improve the volatility tendency of the predicted values significantly. MAE, MSE and RMSE of VMD-RBF model are the lowest among all the models in Figure 13, and MAPE of VMD-RBF model is only 0.0263 different from that of PCA-RBF model but the prediction effect is not evident.   Figure 15 shows the deterministic prediction results of Empirical mode decomposition-Radial Basis Function model (EMD-RBF) and Empirical mode decomposition-Long-Short Term Memory model (EMD-LSTM). As we can see from Figure 15 and Table 2, the prediction result of EMD-RBF is far from the actual wind speed and MSE of EMD-RBF is 8.9093, which is the second among all models. Even though the prediction results of EMD-LSTM is closer to the actual wind speed, but MAE of EMD-LSTM is 75.3289, and there is a difference between prediction value and actual value. The prediction accuracy of both models is not ideal. In addition, the RMSE of VMD-LSTM is only 0.8225 which is lower than all model mentioned in Figures 14 and 15. The modes obtained by EMD is 7 which cannot be preset, it is complex to build prediction model for each modes and too modes will lead to capture extra noise or mode to mode duplication, so as to cause poor prediction. Thus, VMD is more suitable to build prediction model in this paper. The deterministic prediction results of VMD-PCA-RBF-LSTM and the hybrid model VMD-PCA-RBF-LSTM-MGPR proposed in this paper are shown in Figure 16. As we can see from Figure 16, the    Figure 15 shows the deterministic prediction results of Empirical mode decomposition-Radial Basis Function model (EMD-RBF) and Empirical mode decomposition-Long-Short Term Memory model (EMD-LSTM). As we can see from Figure 15 and Table 2, the prediction result of EMD-RBF is far from the actual wind speed and MSE of EMD-RBF is 8.9093, which is the second among all models. Even though the prediction results of EMD-LSTM is closer to the actual wind speed, but MAE of EMD-LSTM is 75.3289, and there is a difference between prediction value and actual value. The prediction accuracy of both models is not ideal. In addition, the RMSE of VMD-LSTM is only 0.8225 which is lower than all model mentioned in Figures 14 and 15. The modes obtained by EMD is 7 which cannot be preset, it is complex to build prediction model for each modes and too modes will lead to capture extra noise or mode to mode duplication, so as to cause poor prediction. Thus, VMD is more suitable to build prediction model in this paper. The deterministic prediction results of VMD-PCA-RBF-LSTM and the hybrid model VMD-PCA-RBF-LSTM-MGPR proposed in this paper are shown in Figure 16. As we can see from Figure 16 Figures 14 and 17, VMD-LSTM improve the prediction value with lower RMSE 0.6765, which reduces 3.9074 than ESN model. MAE of LSTM is 33.594, which shows the prediction value is far from actual wind speed even though it is lower than ESN model with 49.9153. On the contrary, MAE of MGPR is just 2.2739, but the MSE 7.0604 is second only to BP neural network model with 13.3476, and there is a large gap between the prediction value and the actual value as we can see from Figure 16.   Figure 15 shows the deterministic prediction results of Empirical mode decomposition-Radial Basis Function model (EMD-RBF) and Empirical mode decomposition-Long-Short Term Memory model (EMD-LSTM). As we can see from Figure 15 and Table 2, the prediction result of EMD-RBF is far from the actual wind speed and MSE of EMD-RBF is 8.9093, which is the second among all models. Even though the prediction results of EMD-LSTM is closer to the actual wind speed, but MAE of EMD-LSTM is 75.3289, and there is a difference between prediction value and actual value. The prediction accuracy of both models is not ideal. In addition, the RMSE of VMD-LSTM is only 0.8225 which is lower than all model mentioned in Figures 14 and 15. The modes obtained by EMD is 7 which cannot be preset, it is complex to build prediction model for each modes and too modes will lead to capture extra noise or mode to mode duplication, so as to cause poor prediction. Thus, VMD is more suitable to build prediction model in this paper. The deterministic prediction results of VMD-PCA-RBF-LSTM and the hybrid model VMD-PCA-RBF-LSTM-MGPR proposed in this paper are shown in Figure 16. As we can see from Figure 16, the prediction results of both models are observably closer to the actual values than other comparison models. From Table 2 combined with Figure 16, MSE, RMSE and MAE of VMD-PRBF-LSTM-MGPR is 0.3158, 0.5619 and 0.4796 respectively, and both of them are the lowest among all the models, of which MAE of hybrid model is lower than ESN with 83.0297. Therefore, the prediction accuracy of VMD-PRBF-LSTM-MGPR is greater than other comparison models. The prediction results show that the proposed hybrid model has higher accuracy, and the distribution characteristics of prediction results are closer to the actual series through combining the advantages of nonlinear model and stationary time series model. Thus, the proposed model is more suitable for WSP than other nine comparison models.

Analysis of Interval Prediction Results
According to the previous analysis, CIs of wind speed are obtained by MGPR and VMD-PRBF-LSTM-MGPR-MGPR showed in Figures 17 and 18 respectively, and PIs is obtained by VMD-PRBF-LSTM-MGPR-LUBE showed in Figure 19, where Y axis represents the wind speed and X axis represents prediction samples with an interval of 10 min. As it can be seen from those figures, wind speed CIs and PIs of 95% are greater than that of 90% which are greater than that of 85%, which accords with the general statistical law, so the prediction results are correct and has statistical significance. CIs are obtained based on deterministic prediction results, thus the results of MGPR cannot reflect the fluctuation and randomness clearly, while CIs of the proposed method can. The interval width of the hybrid model is obviously narrower than that of single model MGPR from above pictures. Figure 17 shows the deterministic prediction and interval prediction results of wind speed obtained by the single model MGPR, where the interval prediction contains the results with confidence of 95%, 90% and 85% respectively. As can be seen from Table 3 and Figure 18, when the confidence is 85%, the I PICP of MGPR is 86%, and the I MPIW is 8.1148. Compared with the 90% confidence level, the I PICP increased by 10%, but the I MPIW also increased by 1.1575. When the confidence is 95%, the interval coverage rate is 98%, but the I MPIW is 11.0486, therefore it can be seen that the prediction results of the single model MGPR are poor.

Conclusions
A new WSP model is proposed considering volatility and randomness of wind speed in this   Figure 18 shows the interval prediction results obtained by the hybrid model VMD-PRBF-LSTM-MGPR proposed in this paper, which includes the deterministic prediction results and the confidence intervals under different confidence levels. Obviously, CIs of this method can significantly reflect the stochastic volatility of wind speed and the interval width is narrower than CIs of MGPR. Combined with Table 3, when the confidence level is 85%, the mean width of the confidence interval I MPIW is only 1.5865, but the interval coverage rate is only 86%, which is the lowest I PICP among the three confidence levels. Compared with 90% confidence, the interval coverage rate is 94% with an increase of 8%, and the I MPIW is 1.8127. However, when the confidence is 95%, the interval coverage rate is not improved and the I MPIW increases by 0.3473. Figure 19 shows the PIs obtained on the basis of the deterministic prediction results of VMD-PRBF-LSTM-MGPR combining LUBE. Combined with Table 3, when the confidence level is 85%, the I PICP reaches 94% and the I MPIW is 2.1240. When the confidence is 90%, the I PICP only increases by 4% and the I MPIW increases to 2.3996. When the confidence is 95%, the interval coverage rate remains the same. The I MPIW increases to 2.8079. The interval prediction results show that PIs of 95% and 90% contain most of the actual wind speed values. In addition, CIs and PIs can quantify and represent forecast uncertainty. CIs predict uncertainty of unknown but fixed value, PIs express the uncertainty in the prediction of a future realization of a random variable, and prediction width is greater than corresponding confidence interval.

Conclusions
A new WSP model is proposed considering volatility and randomness of wind speed in this paper, which can be applied to short-term deterministic and interval prediction with high accuracy and strong stability. Based on the deterministic WSP, the randomness, volatility and low energy density characteristics of wind speed can be expressed clearly and accurately, which is beneficial to the grid scheduling plan optimization and can increase the reliability of wind power grid. Based on the interval WSP, the volatility range of wind speed can be significantly predicted, which is beneficial to quantify the intermittent risk of wind turbine. The prediction results show that this method is valid and feasible.
In this paper, the wind speed series is exacted into different IMFs by VMD decomposition, and the corresponding models are built for each part. PCA analysis is used to realize multidimensional reduction of the original input parameters, and the PCA-RBF model is established to reduce the computational complexity of the model. LSTM model is established for IMF2 with low fluctuation and a stationary time series model is carried out. The MGPR model is used to predict the error series, and the results of both deterministic prediction and interval prediction are obtained meanwhile. The prediction results obtained from the above models are reconstructed to obtain the final results of WSP and complete the construction of VMD-PRBF-LSTM-MGPR model.
The experimental results show that the proposed hybrid model can effectively reduce the prediction error caused by randomness and fluctuation of wind speed. MAPE, MSE and RMSE of deterministic prediction of the hybrid model proposed in this paper are only 0.0713, 0.3158 and 0.5619 respectively, which are more accurate than the prediction results of comparison models. The average absolute error is 0.4796, which is 83.0297 lower than the single model ESN, proving the validity of the model. In addition, the interval prediction results obtained by the model presented in this paper show that, compared with CTs obtained by the single model MGPR, when the confidence degree is 85%, the average interval width decreases by 80.45%, and the interval coverage is about 95% with the confidence of 90% and 95% respectively. However, the average interval width of the model proposed in this paper is narrower, indicating that the prediction accuracy is higher. In this paper, we make a comparison between CIs and PIs, and it shows that both of them can quantify and represent forecast uncertainty and the width of PIs is wider than CIs under the same confidence degree.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.