LSSVR Model of G-L Mixed Noise-Characteristic with Its Applications

Due to the complexity of wind speed, it has been reported that mixed-noise models, constituted by multiple noise distributions, perform better than single-noise models. However, most existing regression models suppose that the noise distribution is single. Therefore, we study the Least square SVR of the Gaussian–Laplacian mixed homoscedastic (GLM−LSSVR) and heteroscedastic noise (GLMH−LSSVR) for complicated or unknown noise distributions. The ALM technique is used to solve model GLM−LSSVR. GLM−LSSVR is used to predict short-term wind speed with historical data. The prediction results indicate that the presented model is superior to the single-noise model, and has fine performance.


Introduction
In practical applications, if the data are collected in a multi-source environment, the noise distribution is complex and unknown. Therefore, it is almost impossible for a single-noise distribution to clearly describe the real-noise [1]. LSSVR is a method of LR that implements a sum-of-squares error function together with regularization, thus controlling the bias-variance trade-off [2,3]. It is intended to find the concealed linear structures in the original data [4,5]. For the sake of transition from linear to nonlinear function, the following generalization can be made [6]: by mapping input vectors into a high-dimensional feature space H (H is Hilbert space) through some nonlinear-mapping, seek the solution of the optimization problem in space H. Using a suitable kernel function K(•, •), nonlinear-mappings can be estimated by kernel LSSVR, which is an extended LR with kernel techniques. In recent years, LSSVR as a data-rich nonlinear forecasting tool has been increasingly welcomed [7], which is applicable in many different contexts [8][9][10], such as machine learning, optical character recognition, and especially wind speed/power forecasting.
Generally, the existing techniques used for wind-speed forecasting include: (i) physical; (ii) statistical (also called data-driven); and (iii) artificial intelligence (AI)-based methods. The physical models attempt to estimate wind flow around and inside the wind farm using physical laws governing the atmospheric behavior [11,12]. The statistical models seek the relationships between a set of explanatory variables and the on-line measured generation data, and the historical wind speed data recorded at the site are only used to establish the statistical model. We can model it in a variety of ways, including persistence method

Bayesian Principle to Mixed Noise Empirical Risk Loss
Given the Dataset D N = {(A 1 , y 1 ), (A 2 , y 2 ), · · · , (A N , y N )}, (1) where A i = (x i1 , x i2 , · · · , x in ) T ∈ R n , y i ∈ R(i = 1, 2, · · · , N) is the training data. R represents real number set, R n is the n-dimensional Euclidean-Space, and N is the sample size. Superscript T is the transpose of matrix. Assuming that the sample of dataset D N is generated by the additive noise function ξ, the relationship between the measured value y i and predicted value f (A i ) is: where ξ i is random, i.i.d. (independent, identical probability distribution) with p(ξ i ) of mean µ and standard deviation σ. Generally, the noise PDF (probability density function) p(ξ) = p(y − f (A)) is unknown. It is necessary to predict unknown target f (A) from training set D f ⊆ D N . Following the authors of [30,31], the optimal empirical risk loss in the sense of Maximum Likelihood i.e., the empirical risk loss l(ξ) is the log-likelihood of noise characteristic. It is assumed that noise in Equation (2) is Laplacian, with PDF p(ξ) = 1 2 e −|ξ| . By Equation (3), in MLE the optimal empirical risk loss should be l(ξ) = |ξ|.

LSSV R Model of G-L Mixed Noise-Characteristic
Given the training samples D f ⊆ D N , construct the linear regressor f (A) = T · A + b. To deal with nonlinear problems, it can be summarized as follows: mapping input vectors A i ∈ R n into high-dimension feature space H through the nonlinear mapping Φ (take a prior distribution), induced by nonlinear kernel function K(A i , A j ), kernel mapping Φ is any positive definite Mercer kernel.
Definition 1 ([6,28]). Positive definite Mercer kernel: Assume that X is a subset of R n . Assume that the kernel function K(A i , A j ) defined on X × X is a positive definite Mercer kernel functionl the kernel mapping Φ is called a positive definite Mercer kernel if there is mapping Φ : X → H (H is Hilbert Space), such that where (·) represents the inner-product in Space H.
Therefore, the optimization problem of Space H is solved. At present, the input vectors (A i · A j ) are replaced by inner product (Φ(A i ) · Φ(A j )) in feature space H. Through the use of kernel K(A i , A j ) = (Φ(A i ) · Φ(A j )), the linear model be extended to a nonlinear LSSVR.
In general, the mixed distribution has fine approximation ability to any continuous distribution. When there is no prior knowledge of real-noise, it can well adapt to unknown or complicated noise. Thus, it is presented that a uniform model LSSVR with mixed noise characteristics (M − LSSVR). The primal problem of model M − LSSVR is formalized as where parameter ∈ R n represents weight-vector, b is the bias-term, C > 0 is the penalty parameter, and the weight factors are λ 1 , In the application domain, most distributions do not obey Gaussian distribution, and they also do not satisfy Laplacian distribution. the noise distribution is complicated, and it is almost impossible to describe real noise with a single distribution. It has been reported that mixed noise models, constituted by multiple noise distributions, perform better than single-noise model [1]. As the function fitting -machine, the goal is to estimate an unknown function f (A) from dataset D f ⊆ D N . In this section, G-L mixed homoscedastic and heteroscedastic noise distributions are used to fit complicated noise characteristic.
and deriving the partial-derivative , b, ξ, respectively, on the basis of KKT-conditions, we get . The extreme condition is replaced by L( , b, α, ξ), and the maximum value of α is obtained. The dual problem in Equation (8) of the primal problem in Equation (7) is derived.
The decision-maker for GLM − LSSVR may be represented as where the parameter vector ∈ R n , Φ : Suppose the noise in Equation (2) is Gaussian homoscedastic noise, which is Gaussian noise of zero mean and the homoscedastic variance σ 2 . Thus, the dual problem of LSSVR can be derived by Theorem 2:

Proposition 2.
The solution of the primal problem in Equation (10) of GLMH − LSSVR is existent and unique about .
Proof. It is easier to derive the proof of Theorem 2 by analogy with Theorem 1.

We have
The decision-maker for GLMH − LSSVR may be expressed as where the parameter vector is ∈ R n , Φ : R n → H, and K(A i , A j ) is the kernel function. Suppose noise in Equation (2) is G-L mixed-homoscedastic-noise, in which Gaussian-noise of zero mean and homoscedastic-variance σ 2 , Theorem 1 can be deduced from Theorem 2.

Solution from ALM
In this section, we use Augmented Lagrange-multiplier method (ALM) [32] to solve the dual problem in Equation (8) by applying Gradient descent or Newton's method to a sequence of equality-constrained problems. By eliminating equality constraints, arbitrary equality constraints can be reduced to equivalent unconstrained problems [33,34]. If there are large-scale training samples, some rapid optimization techniques can be combined with the proposed model, for example the sequential minimal optimization (SMO) algorithm [29] and the stochastic gradient decent (SDG) algorithm [35].
Theorems 1 and 2 provide effective recognition techniques for GLM − LSSVR and GLMH − LSSVR, respectively. In this section, we derive the solution from ALM and the algorithm for model LSSVR of G-L mixed homoscedastic noise characteristic (GLM − LSSVR). Analogously, the solution of model GLMH − LSSVR can be obtained by ALM method.
(4) Build the decision-function as follows

Case Study
This section tests and verifies the validity of constructed model GLM − LSSVR by comparing it with other techniques in the Heilongjiang, China dataset D N . This case study consists of the following subsections: G-L mixed-noise characteristic of wind speed, prediction performance evaluation criteria, and short-term wind-speed forecasting based on an actual dataset.

G-L Mixed-Noise-Characteristic of Wind-Speed
To demonstrate the effectiveness of the proposed model, we collected wind speed data from Heilongjiang. The dataset consists of more than one year of wind speed data, recording wind speed values every 10 min. We first discovered the G-L mixed noise and conducted experiments on it. We found that turbulence is the main reason for the high uncertainty of wind speed random fluctuations. From the perspective of wind energy, the most significant feature of wind energy resources is their variability. Now, it shows the distribution of wind speed. Take a wind speed value every 5 s and calculate the histogram of wind speed within 1-2 h. Two typical distributions are given: one is calculated when the wind speed is high and the other is calculated when the wind speed is low (see Figures 2 and 3, respectively).  We analyzed the one-month time-series dataset, and used the persistence method to investigate the error distribution [32]. The results show that the wind speed error ξ obtained from the persistence prediction is not subject to single distribution, while approximately to G-L mixed distribution, and PDF of ξ is p(ξ) = 1 2 e −|ξ| · 1 2σ 2 ξ 2 , as shown in Figure 4. As can be seen from the above charts and figures, wind speed error approximately satisfies G-L mixed distribution. This is a mixed kind of task.

Prediction Performance Evaluation Criteria
It is generally known that no prediction model forecasts perfectly. The predictable performance of ν − SVR, GN − SVR, LSSVR, and GLM − LSSVR also has certain evaluation criteria, for example MAE (mean absolute error), RMSE (root mean square error), MAPE (mean absolute percentage error), and SEP (the standard error of prediction). The four criteria be defined as follows: where N is the size of the dataset D N , y i is the ith actual observed data, and y i is the ith forecasted-result. y is the mean value of observations y i ∈ D N [36][37][38][39][40]. MAE shows how similar the predicted value is to the observed value, while RMSE measures overall deviation between predicted value and observed value. MAPE is the ratio between error and observed value. SEP is the ratio of RMSE to average observation. They are dimensionless measurements of accuracy of wind speed system, and are sensitive to small changes.

Short-Term Wind-Speed Forecasting with Real dataset
In this section, 2160 consecutive data (1-2160, time span of 15-days) are extracted as the training set and 720 consecutive data (2161-2880, time span of 5-days) are extracted as the testing set. The input vector is A i = (x i−11 , x i−10 , · · · , x i−1 , x i ), x j is the actual observed data of wind speed at moment j(j = i − 11, i − 10, · · · , i), and the forecasting value is x i+step , where step = 1, 3, 6. That is, the above models are used to forecast wind speed of each point x i after 10, 30 and 60 min, respectively. Figures 5-13 describe the forecasting results given by models ν − SVR, GN − SVR, LSSVR, and GLM − LSSVR.
ν − SVR: The authors of [41,44] define the dual problem of ν − SVR as GN − SVR: The authors of [45,46] studied SVR with equality constraints and inequality constraints. The loss-function of Gaussian-noise is c(ξ i ) = ξ 2 i /2, (i = 1, · · · , N). Thus, thus dual problem of GN − SVR is LSSVR: [22] studied LS − SVR for Gaussian-noise model. The dual problem of LS − SVR is where ξ i , ξ * i are slack-variables. C > 0, ν ∈ (0, 1] are constants. For ν − SVR and GN − SVR, the size of is not gained, but is a variable whose value is compromised by a constant with the model complexity and relaxation variables through ν [35].         In Figures 5, 8 and 11, wind-speed forecasting-results at A i -point of ν − SVR, GN − SVR, LSSVR, and GLM − LSSVR are presented after 10, 30, and 60 min, respectively. Figures 6, 9, and 12 show the error statistic of wind-speed prediction using the above four models. The box plots (Figures 7, 10, and 13) of several noise levels further intuitively demonstrate the comparative effect of error statistics using the above four wind-speed forecasting models. The statistical criteria of MAE, MAPE, RMSE and SEP are displayed in Tables 1-3. From box-whisker plots in Figures 7, 10, and 13, as well as Tables 1-3, it can be concluded that, in most cases, the forecasting-error of GLM − LSSVR calculation is superior to ν − SVR, GN − SVR and LSSVR. With the increase of prediction horizon to 30 and 60 min, the forecasting error of different models increases and the relative error decreases. Thus, in these cases, it is not that important. However, Tables 1-3 show that, under all the criteria of MAE, MAPE, RMSE, and SEP, the Gaussian-Laplacian mixed-noise model is slightly better than the classical model.

Conclusions
Most existing regression-techniques suppose that the noise model is single. Wind-speed forecasting is complicated due to its volatility and uncertainty, thus it is difficult to model with a single-noise distribution. This section summarizes our main work: (1) optimal empirical risk loss of G-L mixed noise is deduced by Bayesian principle; (2) the LSSVR of G-L mixed homoscedastic noise (GLM − LSSVR) and G-L mixed heteroscedastic noise (GLMH − LSSVR) for complicate noise is developed; (3) the dual problem of GLM − LSSVR and GLMH − LSSVR is obtained using Lagrange-functional and according to KKT conditions; (4) the stability and effectiveness of the algorithm are guaranteed by solving GLM − LSSVR with the ALM method; and (5) the proposed technology is used to predict short-term wind speed by historical data, and then forecast the wind speed at some time after 10, 30, and 60 min, respectively. The comparison results display that the proposed model is better than classical technologies in statistical criteria.
In the same way, we can also study Gaussian-Laplacian , or Gaussian-Weibull mixed noise classification models. The new hybrid noise models would effectively solve complicated noise classification problems.