Twin Least Square Support Vector Regression Model Based on Gauss-Laplace Mixed Noise Feature with Its Application in Wind Speed Prediction

In this article, it was observed that the noise in some real-world applications, such as wind power forecasting and direction of the arrival estimation problem, does not satisfy the single noise distribution, including Gaussian distribution and Laplace distribution, but the mixed distribution. Therefore, combining the twin hyperplanes with the fast speed of Least Squares Support Vector Regression (LS-SVR), and then introducing the Gauss–Laplace mixed noise feature, a new regressor, called Gauss-Laplace Twin Least Squares Support Vector Regression (GL-TLSSVR), for the complex noise. Subsequently, we apply the augmented Lagrangian multiplier method to solve the proposed model. Finally, we apply the short-term wind speed data-set to the proposed model. The results of this experiment confirm the effectiveness of our proposed model.

In these SVR models, when solving regression problems, the noise of the training data is considered to be the single distribution. According to the Bayesian principle, first, the Gaussian noise with square loss is the best, secondly, the Beta noise with Beta loss is the best, finally, the Laplace noise with Laplace loss is the best [25,26]. However, in some practical applications, if data are collected in a multi-source environment, then the noise distribution is complex and unknown. Therefore, a single distribution cannot clearly describe the real noise [27,28]. In general, the mixed distribution has a good approximation ability for any continuous distribution. For some actual noises, prior knowledge is difficult to obtain. At this time, mixed noise can be well adapted to unknown or complex noise. In 2017, the research and application of a new wind speed hybrid forecasting system that is based on multi-objective optimization is proposed [27,29], the proposed hybrid model is integrated with three components, singular spectrum analysis, the firefly algorithm, and the BP neural network [30]; as compared with a single BP, the prediction effect of the hybrid prediction method is better, which shows that the prediction ability of the hybrid method is stronger. In addition, accurate prediction of wind speed is a key task for the development and utilization of wind energy, when compared with other related methods, the proposed hybrid method has satisfactory performance in terms of accuracy and stability [31]. In this literature [32], two new nonlinear regression models for single-task and multi-task problems are developed, in which the noise is composed of Gaussian mixture. When compared to some other models, the proposed model becomes a robust nonlinear regression model with strong adaptation.
However, the main disadvantage of SVR is the high cost of learning. In order to improve the calculation speed of SVR, based on twin support vector machine (TSVM) [33], Peng [34][35][36] proposed twin support vector regression (TSVR). Unlike SVR, TSVR generates two non-parallel upper and lower bound functions by solving a pair of smaller quadratic programming problems (QPPs). In theory, TSVR reduces the computational cost compared to standard SVR. Zhao et al. [37] extended the concept of twin hyperplanes, and combined the advantages of least squares support vector regression (LSSVR) to generate the estimated regressor, called Twin Least Squares Support Vector Regression (TLSSVR). By observing the model of Peng [34], Khemchandani et al. [38] believed that only the principle of empirical risk minimization was considered in TSVR. To overcome these difficulties, Shao et al. [39] proposed another twin regression model, called ε-TSVR, which considers the principle of structural risk minimization. Later, Rastogi et al. [40] extended ε-TSVR and proposed ν-TSVR, which can automatically optimize parameters ε1 and ε2 based on sample data. By using the pinball loss function, Xu et al. [41] further developed an asymmetric ν-twin support vector regression, called Asy-ν-TSVR, which can effectively reduce noise interference and improve the generalization performance. Therefore, extensive research has been conducted on the twin-type SVR. In all of these twin-type SVR models, the distribution of training data is not considered in solving regression problems. This means that, regardless of whether the samples are important or not, all of the samples play the same role in the constraint function, so it will cause regression performance to decline. Depending on the importance of the data, given different samples, the penalty is more reasonable. For this reason, various methods [42][43][44][45][46] have been developed in order to study this shortcoming. For example, Xu et al. [44] proposed using the local information present on the sample based on K-nearest neighbor weighted twin support vector regression to improve the prediction accuracy. By clustering based on the similarity of training data, Parastalooi et al. [45] proposed an improved twin support vector regression. Ye [46] proposed an effective weighted Lagrangian ε-twin support vector regression (WL-ε-TSVR) with quadratic loss function, in which the weight matrix D was introduced in order to reduce the outlier pair to a certain extent Regression of the influence of variables, so as to impose different penalties on samples.
Traditionally, the upper and lower regression of the twin SVR is obtained by approximate dual solutions. However, Chapelle [47] observed that, by comparing the approximate efficiency of SVR in the primal space and the dual space, the approximate dual solution may not produce a good primal approximate solution. Some related work is directly solved in the primal space [48][49][50][51]. For example, inspired by the twin SVR and Newton methods, Balasundaram et al. [49] proposed a new unconstrained Lagrangian TSVR (ULTSVR) to solve a pair of unconstrained minimization problems, thereby increasing the calculation speed. Gupta [50] and Balasundaram [51] use the generalized derivative method to obtain QPPs. Although their work is efficient and fast, they only consider empirical risk minimization and do not consider structural risks.
Inspired by the above research, we try to study the characteristics of the complex or unknown noise distribution of the Gauss-Laplace mixed noise twin least squares support vector regression (GL-TLSSVR) model. In this article, for the solution to the regression task, the augmented Lagrange multiplier method (ALM) algorithm is used in our experiments, it can help us better to find the optimal solution.
This work mainly provides four contributions, we describe the whole methodology in the flowchart, as shown in Figure 1.

Related Work
In this section, the data-set is represented by D N = {(A i , y i )}, i = 1, 2, . . . , N, where A i ∈ R n , y i ∈ R(i = 1, 2, . . . , N) is the training samples.
According to the Bayesian principle, we can derive the optimal empirical risk loss of the mixed noise characteristics [52]. The best empirical risk loss for this mixed noise distribution is shown below where l 1 (ζ) > 0, l 2 (ζ) > 0 are the convex empirical risk loss of the above two noise characteristics. λ 1 , λ 2 ≥ 0 are weight factors, and λ 1 + λ 2 = 1. Figure 2 shows the G-L empirical risk loss for different parameters.

TLSSVR Model of G-L Mixed Noise Characteristics
For the linear model, we want to find a linear regression function f (A) = T · A + b.
When dealing with some nonlinear problems, some specific methods are given ( [53]): the input vector A i ∈ R n is mapped by a non-linear mapping Φ: R n → H (take a prior distribution) to the high dimensional feature space H (H is Hilbert space), induced by the nonlinear kernel function The twin least squares support vector regression model with mixed noise characteristics (M-TLSSVR) is proposed. The primal problem with model M-TLSSVR is shown below where 1 , 2 denotes the weight vector and b 1 , b 2 is the bias term, Φ(A) is the nonlinear mapping that transfers the input vector to a higher-dimensional feature space.
According to the literature [28], the mixed noise model is distributed by multiple noises, and its performance is better than the single noise model. In this section, Gauss-Laplace mixed homoscedastic and heteroscedastic noise distributions are used to describe complex noise characteristics.

TLSSVR Model of G-L Mixed Homoscedastic Noise Characteristics
According to Bayesian principle, it concludes that the empirical risk loss of the homoscedastic Gaussian noise of the lower bound function is l 1 (ξ) = 1 2σ 2 · ξ 2 , the Laplace noise is l 2 (ξ) = |ξ|. Adopting G-L mixed homoscedastic noise distribution to fit complicated noise-characteristic, by Equation (1), the empirical risk loss about G-L mixed homoscedastic noise is l(ξ) = λ 1 2σ 2 · ξ 2 + λ 2 · |ξ|. The lower bound function of the G-L mixed homoscedastic noise characteristic TLSSVR model (GLM-TLSSVR) is proposed, the primal problem of the lower bound function is depicted as Similarly, we can get that the primal problem of the upper bound function of the model GLM-TLSSVR is Where ξ i and ξ * i are the random noise and slack variables at time i. parameter vector ω 1 , ω 2 ∈ R n , σ 2 , σ * 2 are homoscedastic, C 1 , C 2 > 0 are a penalty parameter, and the weight factors are λ 1 ,

Theorem 1. The dual problem of primal problem (4) of GLM-TLSSVR is
The dual Problem of primal problem (5) of GLM-TLSSVR is where parameter vector ω 1 , ω 2 ∈ R n , σ 2 , σ * 2 are homoscedastic, C 1 , C 2 > 0 are a penalty parameter, and the weight factors are λ 1 , Proof. On the lower bound function of the GLM-TLSSVR model, for any vector u, If we set u ± ≥ 0 to u = u + − u − , then min |u| = min{u + + u − } will be established [54]. Therefore, by setting ξ i = p i − r i , r i , p i ≥ 0, the primal problem of the lower bound function of GLM-TLSSVR is simplified, as follows We introduce the Lagrange function and KKT(Karush-Kuhn-Tucker) condition [55].
We get the solution of the lower bound function Thus, the lower function of model TLSSVR with Gauss-Laplace mixture homoscedastic noise characteristic (GLM-TLSSVR) can be written as The primal problem of the upper bound function of GLM-TLSSVR is simplified, as follows Similarly, we introduce the Lagrange function and KKT conditions again.
We obtain the solution of the upper bound function Thus, the upper function of model TLSSVR with Gauss-Laplace mixture homoscedastic noise characteristic (GLM-TLSSVR) can be written as At last, the estimated regressor of GLM-TLSSVR is written, as follows where, parameter vector ω 1 , ω 2 ∈ R n , φ : is the kernel function.
Proof. Similar to Theorems 1 and 2, an appendix to the proof of Theorem 3.
We can obtain the solution of the lower bound function Thus, the lower function of model TLSSVR with Gauss-Laplace mixture heteroscedastic noise characteristics (GLMH-TLSSVR) can be written as We also get the solution of the upper bound function The upper function of model TLSSVR with Gauss-Laplace mixture heteroscedastic noise characteristics (GLMH-TLSSVR) can be written as At last, the estimated regressor of GLMH-TLSSVR is written, as follows If this noise characteristic is Gaussian with the homoscedasticity, we can use Theorem 3 in order to derive Theorems 1 and 2.

ALM Method Analysis
In this section, we apply the augmented Lagrange multiplier method (ALM) [56] to solve the duality problems in Equations (6) and (7) by applying gradient descent or Newton's method to equality-constrained sequences. By eliminating the equality constraints, any equality constraints can be reduced to the equivalent unconstrained problem [57,58]. When we deal with some large-scale data sets, some rapid optimizations can combine these techniques with the proposed model. For example, the sequential minimum optimization (SMO) algorithm [59] and stochastic gradient appropriate (SDG) algorithm [60].
From Theorems 1-3, we can find that this ALM method can help us to effectively identify the GLM-TLSSVR and GLMH-TLSSVR models. In this section, the lower bound function and upper bound function of the GLM-TLSSVR model can be solved by the ALM method. Similarly, the lower bound function and upper bound function of the GLMH-TLSSVR model can also be solved by the ALM method. The specific algorithm steps are as follows (1) Set data-set be D N = {(A 1 , y 1 ), (A 2 , y 2 ), . . . , (A L , y N )}, where A i ∈ R n , y i ∈ R, i = 1, . . . , N.
(2) Select the appropriate kernel function through the 10-fold cross-validation strategy and obtain the appropriate parameters C 1 , C 2 , λ 1 , λ 2 , λ 3 , λ 4 of the lower and upper bound function of the model GLM-TLSSVR.

Experiments and Discussion
In the section, to check the performance of the proposed model GLM-TLSSVR, we compared it with ν-SVR, LS-SVR, and TSVR on actual data-set from Heilongjiang, China. This part mainly includes three contents: G-L mixed noise characteristics of wind speed in Section 5.1; the criteria for algorithm evaluation in Section 5.2; and, application on predicting the short-term wind speed in Section 5.3.

G-L Mixed Noise Characteristics of Wind Speed
What we collected consists of one-year wind speed data-set from Heilongjiang Province, China. These data record the wind speed value every 10 min in order to better analyze the characteristics of mixed noise in the wind speed forecast error. In the above wind speed data, we found that some noise is a mixture of Gauss-Laplace. Some of the researchers have found that turbulence is the main cause of the uncertainty of strong random fluctuations in wind speed. From the perspective of wind energy, the most significant feature of wind resources is its variability. We adopted the persistence method, which is often used to study the distribution of wind speed forecast errors, in order to analyze the wind speed data set of a one-month time series [54]. This experiment shows that the error variable ξ does not satisfy a single noise distribution, but approximately obeys the Gauss-Laplace mixed noise distribution, and the PDF of ξ is P(ξ) = 1 2 e −|ξ| · 1 2σ 2 ξ 2 , we show the forecast error of Gauss-Laplace mixed wind speed distribution in Figure 3. It is found that this is a regression learning task about mixed noise.

The Criteria for Algorithm Evaluation
We specified evaluation criteria before introducing the experimental results in order to compare the performance of various models. The evaluation criteria are, as follows: the mean absolute error (MAE), the root mean square error (RMSE), sum of squared regression (SSR), sum of squared deviation of testing (SST), sum of squared error of testing (SSE), and teTime are used to evaluate the predictive performance of models ν-SVR, LS-SVR, TSVR, and GLM-TLSSVR. The five criteria are defined, as follows [34,37].
In Table 1, L is the number of testing samples, y i is the ith the real value, y * i represents the predicted value, and y is the mean of the testing data-set. teTime(in seconds) represents the testing time of constructing a regressor. Table 1. Evaluation criteria for short-term wind speed prediction.

Application on Predicting the Short-Term Wind Speed
In the section, we confirmed the feasibility and effectiveness of the proposed model GLM-TLSSVR on the short-term wind speed data set of Heilongjiang Province, China. The source of the wind speed data set is a related wind farm under the Meteorological Bureau of Heilongjiang Province, and a lightning imager measures the wind speed. This wind speed data set has been recorded for more than a year, and the average wind speed is recorded every 10 min. In general, we collected a total of 62,466 samples, which have four attributes, namely variance, mean, maximum, and minimum. We use 1440 uninterrupted data samples (from 1 to 1440, the time span is 10 days) as training samples. We also use 720 uninterrupted data samples (from 1441 to 2160, the time span is five days), and 80 consecutive data as the testing samples. As for the original sequence, we need to transform it into a multiple regression task by using mode − → X i = (X i−11 , X i−10 , . . . , X i−1 , X i ) as an input vector to predict X i+step , where the vector orders of wind speed is determined by the chaotic operator network method. Where X j is the real value of wind speed at time j(j = i − 11, i − 10, . . . , i). In the experiments, we try step = 1, 3, and 5. In other words, we predicted the wind speed of every point X i after 10, 30, and 50 min, respectively.
These four models (ν-SVR, LS-SVR, TSVR, and GLM-TLSSVR) have been implemented in Python 3.7 on Windows 10 running on a PC with system configuration Intel i7 processor (3.19 GHz) with 8 GB of RAM. The initial parameters be C 1 , C 2 ∈ 2 i |i = −9, −8, . . . , 10, λ 1 , λ 2 , λ 3 , λ 4 ∈ [0, 1]. C 1 , C 2 , λ 1 , λ 2 , λ 3 , λ 4 are some tuned parameters by virtue of the 10-fold cross validation technique, where the cross validation technique is explained in detail in [61,62]. This technique can help us to find the optimal parameters. In this article, in order to reduce the computational burden of the GLM-TLSSVR model, the parameter assignments are, as follows: As for the choice of kernel function, many experiments show that polynomial kernel function and Gaussian kernel function have good performance. In this experiment, we apply Gaussian kernel functions and polynomial kernel function to these four models (ν-SVR, LS-SVR, TSVR, and GLM-TLSSVR), as below [63].
where d is a positive integer, and σ is positive. The dual problem of ν-SVR, LS-SVR, and TSVR are as follows.
ν-SVR: the authors of [18,61] define the dual problem of ν-SVR, as s.t. : LS-SVR: the authors of [64] define the dual problem of LS-SVR as TSVR: the authors of [34] define the dual problem of TSVR, as where, H = K(A, A T )e . In Figure 4, wind-speed forecasting-results at A i -point of the above four models are presented after 10 min. Figure 5 shows the error statistic of wind-speed prediction using the above four models after 10 min. In Figure 6, wind-speed forecasting-results at A i -point of the above four models are presented after 30 min. Figure 7 shows the error statistic of wind-speed prediction using the above four models after 30 min. In Figure 8, wind-speed forecasting-results at A i -point of the above four models are presented after 50 min. Figure 9 shows the error statistic of wind-speed prediction using the above four models after 50 min. Tables 2-4 display the statistical criteria of MAE, RMSE, SSE/SST, SSR/SST, and teTime. Table 2. Error statistics of four short-term wind speed forecasting models after 10 min.  Table 3. Error statistics of four short-term wind speed forecasting models after 30 min.       From Tables 2-4 and Figures 4-9, these evaluation criteria can indicate that the error statistic of GLM-TLSSVR model is better than that of models ν-SVR, LS-SVR, and TSVR. As the forecast time interval increases from 10-min. to 30-min. and 50-min., the forecasting error of the four models increases and the relative error decreases. Therefore, in these cases, it is not so important. However, as can be seen from Tables 2-4, under all conditions of MAE, RMSE, SSE/SST, and SSR/SST, the model GLM-TLSSVR with Gaussian-Laplace mixed noise characteristics is slightly better than the other three classical ν-SVR, LS-SVR and TSVR models. In general, a lower value of MAE, RMSE, and SSE/SST reflects the consistency between the predicted values and true values, while the higher values of SSR/SST indicate that the regressor accounts for higher statistical information. Further, the performance indices indicate that GLM-TLSSVR outperforms ν-SVR, LS-SVR, and TSVR for short-term wind speed data set in terms of SSE/SST, RMSE, and MAE. The ratio of SSR/SST can estimate the goodness of fit of the predictive model and extract the maximum information from the data set. Therefore, the proposed model GLM-TLSSVR is considered to be the best regression indicator among all of the models. The SSE/SST is lower for GLM-TLSSVR when compared to other methods that imply good estimation between real values and predictive values from Tables 2-4. In addition, among all of the models, the computational cost of testing model GLM-TLSSVR is the lowest, which indicates that our proposed iterative methods are the efficient algorithm for regression, on the other hand, the reason is that this proposed model GLM-TLSSVR combines the spirit with the fast speed of LS-SVR yields a new regressor. In addition, the generalization performance of the proposed model GLM-TLSSVR is best, i.e., it owns the smallest and largest evaluation criteria, respectively, viz. RMSE, and SSR/SST from Tables 2-4; this is mainly due to the idea of twin hyperplanes.

Conclusions
Many regression techniques today assume that this model is a single noise characteristic. Wind speed prediction is complicated by its volatility and uncertainty, so it is difficult to model with a single noise distribution. This section summarizes our main work: (1) we use the Bayesian principle to derive the best empirical risk loss of G-L mixed noise characteristics; (2) the TLSSVR model of G-L mixed homoscedastic noise (GLM-TLSSVR) and G-L mixed heteroscedastic noise (GLMH-TLSSVR) for complicate noise is developed; (3) use the Lagrange function and obtain the dual problem of GLM-TLSSVR and GLMH-TLSSVR according to KKT conditions; (4) solve the GLM-TLSSVR by the ALM method, ensuring the stability and effectiveness of the algorithm; (5) use the proposed technique to predict the future short-term wind speed, calculate wind speed based on past data, and then predict wind speed at some time after 10, 30, and 50 min, respectively. Based on our results, it is observed that GLM-TLSSVR outperforms ν-SVR, LS-SVR, and TSVR for the short-term wind speed data-set, as shown in the experiment. Further, the ratio of SSR/SST can estimate the goodness of fit of the predictive model and extract the maximum information from the data set. Therefore, the proposed model GLM-TLSSVR is considered to be the best regression indicator among all of the models. A low ratio of SSE/SST implies good estimation between real values and predictive values. In addition, the computational time for all the models is evaluated and it is found that GLM-TLSSVR is the lowest, owing to its smaller sized constrained optimization. These results also bring many benefits to the industrial sector, such as better statistical analysis of the relationship between wind speed characteristics and power generation.
There are uncertainties in the data in some actual regression problems. Uncertainty, like this accident, is mainly reflected in the uncertain time of the accident, the uncertain situation of the accident, and the uncertain direction of the accident. We should study the regression algorithm of fuzzy uncertainty with mixed noise characteristics models. In addition, our work only discusses the problem of regression models with Gaussian-Laplace mixed noise characteristics. In fact, we can develop similar problems to classification learning. In a similar idea, we can still study the classification problems with Gaussian-Laplace mixed noise characteristics in the future.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: