ν-Support Vector Regression Model Based on Gauss-Laplace Mixture Noise Characteristic for Wind Speed Prediction

Most regression techniques assume that the noise characteristics are subject to single noise distribution whereas the wind speed prediction is difficult to model by the single noise distribution because the noise of wind speed is complicated due to its intermittency and random fluctuations. Therefore, we will present the ν-support vector regression model of Gauss-Laplace mixture heteroscedastic noise (GLM-SVR) and Gauss-Laplace mixture homoscedastic noise (GLMH-SVR) for complex noise. The augmented Lagrange multiplier method is introduced to solve models GLM-SVR and GLMH-SVR. The proposed model is applied to short-term wind speed forecasting using historical data to predict future wind speed at a certain time. The experimental results show that the proposed technique outperforms the single noise technique and obtains good performance.


Introduction
Wind speed and wind power prediction is becoming increasingly important, and wind speed prediction is crucial for the control, scheduling, maintenance, and resource planning of wind energy conversion systems [1,2]. However, the volatility and uncertainty of wind speed give a fundamental challenge to power system operations. Because the basic characteristics of the wind is its intermittency and random fluctuations [3,4], the integration of wind power into power systems puts forward a series of challenges. The most effective way to resolve the challenges is to improve the prediction accuracy of wind speed and power forecasting [5][6][7].
In general, there are three important types in building a regression algorithm: model structures, objective functions, and optimization strategies. The model structures include linear or nonlinear functions, neural networks [8][9][10], etc. As for objective functions, empirical risk loss has a great effect on the performance of regression models. The selection of empirical risk loss is mostly dependent on the types of noises [11,12]. For example, squared loss is suitable for Gaussian noise [13][14][15], least absolute deviation loss for Laplacian noise [16], and Beta loss for Beta noise [17][18][19]. By the formula of the optimization method, a series of optimization algorithms are developed [20]. This work mainly studies what should be considered in the optimal architecture of the support vector regression (SVR) model in complex or unknown noise.
Recently, SVR has become an increasingly important technology. In 2000, ν-SVR was introduced by Schölkopf, et al. [21] and automatically computes . Suykens et al. [22,23] constructed least squares SVR with Gaussian noise (LS-SVR). Wu [13] and Pontil et al. [24] constructed ν-SVR with Gaussian noise (GN-SVR). In 2002, Bofinger et al. [25] discovered that the output of a wind turbine system is limited between zero and maximum power and that the error statistics do not follow a normal distribution. In 2007, Zhang et al. [26] and Randazzo et al. [27] proposed the estimation of coherent electromagnetic wave impact in the direction of arrival under Laplace noise environment. Bludszuweit et al. [28] explained the advantages of using Beta probability density function (PDF) instead of Gauss PDF to approximate the error distribution of wind power forecasting. According to Bayesian principle, square loss, Beta loss, or Laplacian loss are optimal when the noise is Gaussian, Beta, or Laplacian, respectively [17,18]. However, in some real-world applications, the noise distribution is complex and unknown if the data are collected in muti-source environments. Therefore, a single distribution attended to describes clearly that the real noise is not optimal and almost impossible [29,30]. Generally speaking, mixture distributions have good approximation capability for any continuous distributions. It can adapt well to unknown or complex noises when we have no prior knowledge of real noise. In 2017, the hybrid forecasting model based on multi-objective optimization [29,31], a hybrid method based on singular spectrum analysis, firefly algorithm, and BP neural network forecast the wind speed of complex noise [32]; this shows that the hybrid method has strong prediction ability. The hybrid of least squares support vector machine [33] is applied to predict the wind speed of unknown noise, which improves the forecasting performance of wind speed. Two novel nonlinear regression models [34] where the noise is fitted by mixture of Gaussian were developed, produced good performance compared with current regression algorithms, and provided superior robustness.
To address the above problem, we try to study the ν-SVR model of Gauss-Laplace mixture noise characteristics for complex or unknown noise distribution. In this case, we must design a method to find the optimal solution of the corresponding regression task. Although there has been a large number of SVR algorithm implementations in the past few years, we introduced the augmented Lagrange multiplier method (ALM) method described in Section 4. Sub-gradient descent method can be used if the task is non-differentiable or discontinuous [17], or sequence minimum optimization algorithm (SMO) can be used if the sample size is large [35].
This work offers the following four contributions: (1) the optimal empirical risk loss for general mixture noise characteristic and Gauss-Laplace mixture noise by the use of Bayesian principle is obtained; (2) the ν-SVR model of mixture noise, Gauss-Laplace mixture homoscedastic noise (GLM-SVR), and Gauss-Laplace mixture heteroscedastic noise (GLMH-SVR) for complex or unknown noise is constructed; (3) the augmented Lagrange multiplier method is applied to solve GLM-SVR, which guarantees the stability and validity of the solution; and (4) GLM-SVR is applied to short-term wind speed forecasting using historical data to predict future wind speed at a certain time and to verify the validity of the proposed technique.
A summary of the rest of this article is organized as follows. Section 2 derives the optimal empirical risk loss using Bayesian principle; Section 3 constructs the ν-SVR model of Gauss-Laplace mixture noise characteristics; Section 4 gives the solution and algorithm design of GLM-SVR; numerical experiments are conducted out on short-term wind speed prediction in Section 5; and Section 6 summarizes this article.

Bayesian Principle to Empirical Risk Loss of Mixture Noise
In this section, using the theory of Bayesian principle, we obtain the optimal empirical risk loss of mixture noise characteristics.
Given the following dataset where X i = (x i1 , x i2 , · · · , x in ) T ∈ R n and Y i ∈ R(i = 1, 2, · · · , L) are the datasets, R represents real number set, R n is the n dimensional Euclidean space, L is the number of sample points, and superscript T denotes the matrix transpose, suppose the sample of dataset D L is generated by the additive noise function ε; the following relationship between the measured values Y i and predicted values f (X i ) is as follows: where ε i be random and i.i.d. means independent and identical distribution with P(ε i ) of mean µ and standard deviation σ. In engineering technology, the noise density P(ε) = P(Y − f (X)) is unknown. We want to predict the unknown decision function f (X) from the training samples D f ⊆ D L . Following References [24,36] by the use of Bayesian principle, in maximum likelihood sense, the optimal empirical risk loss is as follows: i.e., the optimal empirical risk loss l(ε) is the log-likelihood of the noise model. The probability density function (PDF) of each single distribution model and the parameters estimation formula under Bayesian principle are summarized in Reference [16]. In particular, the noise ε in Equation (2) is Laplacian, with PDF P(ε) = 1 2 e −|ε| . By Equation (3), the optimal empirical risk loss in the sense of maximum likelihood sense should be l(ε) = |ε|. If the noise in Equation (2) is Gaussian, with zero mean and homoscedastic standard deviation σ, by Equation (3), empirical risk loss about Gaussian noise is l(ε) = 1 2σ 2 · ε 2 . Suppose the noise ε in Equation (2) is Gaussian, with zero mean and heteroscedastic standard deviation σ i (i = 1, 2, · · · , L). By Equation (3), the loss about Gaussian noise is l(ε i ) = 1 2σ 2 i · ε 2 i (i = 1, · · · , L). It is assumed that the noise ε in Equation (2) is the mixture distributions of two kinds of noise characteristics with the probability density functions P 1 (ε) and P 2 (ε), respectively. Suppose that (3), the optimal empirical risk loss about the mixture noise distributions is as follows: where l 1 (ε) > 0, l 2 (ε) > 0 are convex empirical risk losses of the above two kinds of noise characteristics, respectively. Weight factor λ 1 , λ 2 ≥ 0 and λ 1 + λ 2 = 1. The Gauss-Laplace empirical risk loss for different parameter are shown in Figure 1.

Model ν-SVR of Gauss-Laplace Mixture Noise
Given dataset D L , we build a linear regressor f (X) = T · X + b, where denotes the weight vector and b is the bias term. To deal with nonlinear problems, the following summaries can be made [37,38]: the input vector X i ∈ R n is mapped by a nonlinear mapping (chosen a priori) Φ: ) is the inner product in H, and the kernel mapping Φ may be any positive definite Mercer kernel. Therefore, we will solve the optimization problem in feature space H. The linear ν-SVR is extended to the nonlinear ν-SVR by using the kernel matrix K(X i , X j ).
We propose the uniform model ν-SVR of mixture noises (M-SVR). The primal problem of model M-SVR is described as follows: where ξ i and ξ * i are random noises and slack variable at time i. l 1 (ξ i ), l 1 (ξ * i ), l 2 (ξ i ), and l 2 (ξ * i ) > 0 (i = 1, 2, · · · , L) are convex empirical risk loss function values for general noise characteristic in the sample point (X i , Y i ) ∈ D L (i, j = 1, 2, · · · , L). C > 0 is the penalty parameter, ε ≥ 0, and ν ∈ (0, 1]. Weight factor λ 1 , λ 2 ≥ 0 and λ 1 + λ 2 = 1. As a function approximation machine, the objection is to estimate an unknown function f (X) from the training samples D f ⊆ D L . In the field of practical application, most of the distributions do not satisfy the Gauss distribution, and it also does not obey the Laplace distribution. The noise distribution is unknown or complex; a single distribution intended to describe the real noise is almost impossible. Generally, mixture distributions (as Gauss-Laplace mixed distribution) have good approximation capabilities for any continuous distributions, and it can fit the unknown or complex noise. Therefore, we will use the Gauss-Laplace mixed homoscedastic and heteroscedastic noise distribution to fit the unknown or complex noise characteristics in the next section.

Model ν-SVR of Gauss-Laplace Mixture Homoscedastic Noise
If the noise in Equation (2) is Gaussian, with zero mean and the homoscedastic standard deviation σ, by Equation (3), the empirical risk loss of homoscedastic Gaussian noise is We adopt the Gauss-Laplace mixture homoscedastic noise distribution to fit the unknown noise characteristics. By Equation (4), the loss function corresponding to Gauss-Laplace mixture homoscedastic noise characteristics is We put forward a technique of ν-SVR model for Gauss-Laplace mixture homoscedastic noise characteristics (GLM-SVR). The primal problem of model GLM-SVR be described as follows: where ξ i , ξ * i ≥ 0(i = 1, 2, · · · , L) are random noises and slack variables at time i. C > 0 is the penalty parameter, ε ≥ 0, and ν ∈ (0, 1]. Weight factor λ 1 , λ 2 ≥ 0 and λ 1 + λ 2 = 1.

Proposition 2.
The solution of the primal problem of Equation (8) of GLMH-SVR about ω exists and is unique.
Proof. An Appendix A to the proof of Theorem 2.
We get the following: To estimate ε, we get use the following: Thus, the decision function of model GLMH-SVR can be written as follows: where RSVs are samples about α * i − α i = 0 (called support vectors), parameter vector ω ∈ R n , Φ : R n → H, and K(X i , X j ) is the Kernel function.
If the noise in Equation (2) is Gaussian, with zero mean and homoscedasticity, Theorem 1 can be derived by Theorem 2.

Solution Based on the Augmented Lagrange Multiplier Method
The augmented Lagrange multiplier method (ALM) method [39][40][41] is a class of algorithms for solving equality-and inequality-constrained optimization problems. It solves the dual problem of Equation (7) of model GLM-SVR by applying Newton's method to a series of constrained problems. By eliminating equality and inequality constraints, the optimization problem of Equation (7) can be reduced to an equivalent unconstrained problem. Gradient descent method or Newton method can be used to solve above problems [24,42,43]. If there are large-scale training samples, some fast optimization techniques can also be combined with the proposed objective function, such as stochastic gradient decent [44].
In this section, we apply Newton's method to the sequence of inequality and equality constraints and use ALM method to solve model GLM-SVR. Theorem 1 and Theorem 2 provide the algorithms for effectively identifying models GLM-SVR and GLMH-SVR, respectively. The solution based on ALM and algorithm design of model GLM-SVR is given. Similarly, model GLMH-SVR can be solved by the use of ALM.
(1) Let training samples (2) The 10-fold cross-validation strategy is adopted to search the optimal parameters C, ν, λ 1 , and λ 2 and to select the appropriate kernel function K(•, •).

Case Study
In this section, a case study is implemented to demonstrate the effectiveness of the proposed model GLM-SVR through comparisons with other techniques for training-set D f from Heilongjiang, China. This case study includes three subsections: Data collection and analysis in Section 5.1; evaluation criteria for forecasting performance in Section 5.2; and short-term wind speed forecasting of a real dataset in Section 5.3.

Analysis of Wind Speed Mixture Noise Characteristics
In order to analyze the mixture noise characteristics of wind speed forecasting error, we collected wind speed dataset from Heilongjiang, China. The dataset consists of one-year wind speed data, recording the wind speed values every 10 min. We first found the Gauss-Laplace mixture noise in the above data. The researchers have found that turbulence is the major cause of the wind speed's strong random fluctuation uncertainty. From wind energy perspective, the most striking characteristic of the wind resource is its variability. Now we display the distributions of wind speed. We obtain a value for wind speed after every 10 min and compute the histograms of wind speed in one or two hours. Two typical distributions are given as follows: one was computed when the wind speed was higher and the other was computed when the wind speed was lower, as shown in Figure 2 and Figure 3, respectively.  To analyze one-month time series of wind speed dataset, the persistence method is used to investigate the distribution of wind speed prediction errors [28]. The result indicates that the error ξ does not satisfy the single distribution but approximately obeys the Gauss-Laplace mixed distribution and that the PDF of ξ is P(ξ) = 1 2 e −|ξ| · 1 2σ 2 ξ 2 , as shown in Figure 4. This is a regression learning task of mixture noise.

Evaluation Criteria for Forecasting Performance
As we all know, no prediction model forecasts perfectly. There are also certain criteria, such as mean absolute error (MAE), the root mean square error (RMSE), mean absolute percentage error (MAPE), and standard error of prediction (SEP), which are used to evaluate the predictive performance of models ν-SVR, GN-SVR, and GLM-SVR. The four criteria are defined as follows: Among them, L is the size of the training samples, Y i is the ith actual measured data, Y i is the ith forecasted result, and Y is the mean value of observations of all selected samples in the training-set D L [45][46][47]. The MAE reveals how similar the predicted values are to the observed values, whereas the RMSE measures the overall deviation between the predicted and observed values. MAPE is the ratio between errors and observed values, and SEP is the ratio between RMSE and mean values of observations. The indicators MAPE and SEP are unit-free measures of accuracy for predicting wind series and are sensitive to small changes.

Short-Term Wind Speed Prediction of Real Dataset
In this subsection, we demonstrate the validity of the proposed model by conducting experiments on wind speed dataset from Heilongjiang Province, China. The data records more than one year of wind speeds. The average wind speed in 10 min are stored. As a whole, 62,466 samples with 4 attributes: mean, variance, minimum, and maximum. We first extracted 2160 consecutive data points (from 1 to 2160; the time length is 15 days) as the training set and 720 consecutive data points (from 2161 to 2880, the time length is 5 days) as the testing set. We transform the original sequence into a multivariate regression task using mode − → X i = (X i−10 , X i−9 , · · · , X i−1 , X i ) as an input vector to predict X i+step , in which the vector orders of wind speed is determined by the chaotic operator network method [48], where X j is the real value of wind speed at time j(j = i − 10, i − 9, · · · , i), step = 1, 3, 5, that is to say, using the above mode to predict the wind speed at each point X i after 10-min, 30-min, and 50-min, respectively.
where d is a positive integer and σ is positive. In Figures 5-7, the results of wind speed prediction at the X i point for models ν-SVR, GN-SVR, and GLM-SVR are obtained after 10 min, 30 min, and 50 min, respectively.   In Tables 1-3 and Figures 8-11, indicators MAE, MAPE, RMSE, and SEP of wind speed prediction at X i -point for models ν-SVR, GN-SVR, and GLM-SVR are obtained after 10 min, 30 min, and 50 min, respectively.   From Tables 1-3 and Figures 5-11, in most cases, it can be concluded that the error calculation of model GLM-SVR is better than that of models ν-SVR and GN-SVR. As the prediction horizon increases to 30 min and 50 min, the errors obtained by different models rise and the relative difference decreases. However, as can be seen from Tables 1-3, the Gauss-Laplace mixture noise model is slightly superior to the classical model in terms of all indicators: MAE, MAPE, RMSE, and SEP.

Conclusions
The noise distribution is complex or unknown in the real world; it is almost impossible for a single distribution to describe real noise. This article describes the main results: (1) optimal empirical risk loss for mixture noise model is derived by the Bayesian principle; (2) model ν-SVR of the Gauss-Laplace mixture homoscedastic noise (GLM-SVR) and Gauss-Laplace mixture heteroscedastic noise (GLMH-SVR) for complex or unknown noise is developed; (3) the dual problems of GLM-SVR and GLMH-SVR are derived by introducing Lagrange functional L ; (4) the ALM method is applied to solve model GLM-SVR, which guarantees the stability and validity; and (5) model GLM-SVR is applied to short-term wind speed forecasting using historical data to predict future wind speed at a certain time. The experimental results on real-world data of wind speed confirm the effectiveness of the proposed technique.
Analogously, we can study the Gauss-Laplace mixture noise model of classification, which will be successfully used to solve the classification problem for complex or unknown noise characteristics.

Conflicts of Interest:
The authors declare that there is no conflict of interests regarding the publication of this paper.

And have
Substituting extreme conditions into L and seeking maximum of α, α * , Dual Problem (9) of Primal Problem (8) be derived.