Kernel Ridge Regression Model Based on Beta-Noise and Its Application in Short-Term Wind Speed Forecasting

: The Kernel ridge regression ( KRR ) model aims to ﬁnd the hidden nonlinear structure in raw data. It makes an assumption that the noise in data satisﬁes the Gaussian model. However, it was pointed out that the noise in wind speed/power forecasting obeys the Beta distribution. The classic regression techniques are not applicable to this case. Hence, we derive the empirical risk loss about the Beta distribution and propose a technique of the kernel ridge regression model based on the Beta-noise ( BN - KRR ). The numerical experiments are carried out on real-world data. The results indicate that the proposed technique obtains good performance on short-term wind speed forecasting.


Introduction
Linear regression (LR) is an approach to using the least squares method to model the relationship between a scalar dependent variable and one or more explanatory variables.It also refers to the plane points that are fitted with a straight line or the points in a high dimension space that are fitted with a hyperplane.This method is very sensitive to predictors being in a configuration of near-collinearity.Ridge regression (RR) is a variant of linear regression whose goal is to circumvent the problem of predictors collinearity.The ridge regression model is a powerful technique of machine learning which was introduced by Hoerl [1] and Hastie et al. [2], and it is a method from classical statistics that implements a regularized form of least squares regression [3].Ridge regression is an alternative method for learning function based on a regularized extension of least squares techniques [4].
Given the data-set D N = (x 1 , y 1 ), (x 2 , y 2 ), . . ., (x N , y N ) where x i ∈ X = R n , y i ∈ R, i = 1, . . ., N is the data-set.A multiple LR is f (x) = T • x + b.R represents real number set, R n is n dimensional Euclidean space, N is the number of sample points, superscript T denotes the matrix transpose.LR and RR determine the parameter vector ∈ R n by minimizing the objective functions, respectively: The objective function used in ridge regression implements a form of Tikhonov [5] regularization of a sum-of-squares error metric, which is a regularization parameter controlling the bias-variance trade-off [6].This corresponds to penalized maximum likelihood estimation of , assuming the targets have been corrupted by independent identical probability distribution (i.i.d.) samples from a Gaussian noise process with zeros mean and variance σ 2 , i.e., The KRR model based on Gaussian-noise characteristic is derived by Saunders et al [7].RR [1,3,5] aims to find the hidden nonlinear structure in the raw data, while nonlinear mapping is approximated by means of KRR based on kerneltechniques [7][8][9][10][11].Therefore a linear RR model is constructed in a feature-space H (Φ : R n −→ H), induced by a nonlinear kernel function defining the inner product K(x i , x j ) = (Φ(x i ) • Φ(x j )) (i, j = 1, . . ., N).The kernel function Φ : R n → H may be any positive definite Mercer kernel.Therefore, the objective function of KRR based on Gaussian-noise (GN-KRR) minimization can be written as Suppose the noise is Gaussian, the GN − KRR model may meet the requirements.However, the noise in wind speed and wind power forecast does not obey the Gaussian distribution, but the Beta distribution.The classic regression techniques are not applicable to above case.The uncertainty of wind power predictions was investigated in [12].The statistics of the wind power forecasting error were not Gaussian.The work in [13] also found that the output of wind turbine systems is limited between zero and the maximum power and the error statistics do not follow a normal distribution.It also proved that using the Beta-function is justifiable for wind power prediction about chi-squared tests.In [14], the standard deviation of the data set was a function of the normalized predicted power p = p pred /p inst , where p pred is the predicted power and p inst is the wind power installed capacity.Fabbri [14] pointed out standardized production power p be within the interval [0, 1] and Beta-function are more suitable than standard normal distribution.Literature [15] exhibited the advantages of using Beta-probability distribution function (pdf) instead of Gaussian pdf for approximating the forecasting error.Based on the above literature [12][13][14][15][16], this work plans to study the error of Beta-distribution between the predicted values x p and the measured values x m in the wind speed forecasting, and pdf are parameters, h is normalization-factor, and the parameters u, v may be determined by the given values of mean and standard deviation [17].
- Beta pdf,u=1.5,v=1.9It is not suitable to apply the KRR techniques based on Gaussian-noise model (GN-KRR) to fit functions from data-set with Beta-noise.In order to solve the above problem, this work focuses on the utilization of optimization theory and Beta-noise loss function and derives a method of KRR based on Beta-noise characteristic (BN-KRR).It also introduces a forecasting technique that can deal with high-dimensionality and nonlinearity simultaneously.
This paper is organized as follows.In Section 2, we will derive the Beta-noise empirical risk loss by the Bayesian principle.Section 3 describes the proposed KRR model based on the Beta-noise.Section 4 gives the solution and algorithm design of KRR of the Beta-noise characteristic based on Genetic Algorithm.The numerical experiments are carried out on BN-KRR to short-term wind speed and wind power prediction in Section 5, respectively.Finally, the conclusions and future work are given in Section 6.

Bayesian Principle to Beta-Noise Empirical Risk Loss
Learning to fit data with noise is an important problem in many real-world data mining applications.Given a training set D N of (1) with noise is additive where ξ i is random i.i.d.P(ξ i ) with standard deviation σ and mean µ.
The objective is to find regressor f minimizing the expected risk [18,19] R[ f ] = l(x, y, f (x))dP(x, y) based on the empirical data D l , where l(x, y) is a empirical risk loss (determining how we will penalize estimation errors).Since we do not know the distribution P(x, y), it can only use data-set D N to estimate a regressor f and minimize R[ f ].A possible approximation consists of replacing the integration by the empirical estimate to get the empirical risk R emp In general, we should add a capacity control term in RR and KRR, which leads to the regularized risk functional [18,20] where is the empirical risk loss of Gaussian-noise characteristics for LR (2), RR (3), and KRR (4).However, what is the empirical risk loss about Beta-noise of KRR model?The Beta-noise empirical risk loss by the use of Bayesian principle is given as follows.
The regressor f (x) is unknown, the objection is to estimate the regressor f (x) from g ∈ D N .According to the literature [20][21][22], the optimal empirical risk loss from maximum likelihood be The maximum likelihood estimation be Maximizing p(X f |X) is equivalent to minimizing −log(p(X f |X)).Using Equation ( 7), we have Suppose noise in Equation ( 5) adheres to Beta distribution with mean µ ∈ (0, 1) and variance σ 2 , thus we can get [13,14], where h = Γ(u + v)/Γ(u) • Γ(v) is the normalization-factor.By Equation (10), the Beta-noise empirical risk loss is Empirical risk loss of Gauss-noise and Beta-noise with different parameters is shown in Figure 2.

KRR Model Based on Beta-Noise
It is not appropriate to apply the KRR model based on Gaussian-noise characteristic (GN-KRR) to deal with tasks with the Beta-noise distribution.Consequently, we use Beta-noise loss function and maximum likelihood method to estimate the optimal loss function.Now, we derive the optimal empirical risk loss about Beta-noise distribution, and propose a new technique of the KRR model based on Beta-noise characteristic (BN-KRR).
First, considering constructing LR regressor f (x) = T • x i + b, where We use kernel techniques and construct the kernel function Then we extend kernel techniques to the ridge regression model based on the Beta-noise characteristic.
Let the set of inputs be {(x i , y i ), i = 1, • • • , N}, where i represents the indicator for the i-th sample in D l .For the general Beta-noise characteristic, it is Formula (11) that the Beta-noise loss function c(ξ i ) in the sample point {(x i , y i )} of D N .Owing to the fact that ridge regression and KRR techniques with Gaussian-noise characteristic (GN − KRR) are not suitable to Beta-noise distribution in time series problems, the Formula (11) is selected as Beta empirical risk loss to overcome the shortage of GN-KRR.The primal problem of KRR model with the Beta-noise (Denoted by BN-KRR) can be described as follows (C > 0) where c(ξ Theorem 1. Model BN-KRR's Solution to original Problem (12) about exists and is unique.
On account of

Note:
The KRR of the Gaussian-noise characteristic (GN − KRR) was discussed in [9][10][11].The Gaussian empirical risk loss in the sample point The dual Problem of model RR based on the Gaussian-noise characteristic (GN-RR) is

Solution Based on Genetic Algorithm
We get the Solution and algorithm design of model KRR based on Beta-noise characteristic (BN-KRR) as follows. (1) Let training samples D N = {(x 1 , y 1 ), (x 2 , y 2 ), . . ., (x N , y N )}, where Select the appropriate positive C, u, v and the suitable kernel Solve optimization Problem (18), gain optimal Solution α = (α Construct the decision-making function The confirmation of unknown parameters of model BN − KRR is a complicated process and the appropriate parameter combination of the models can enhance the regression accuracy of the kernel ridge regression based on Beta-noise.Genetic Algorithm (GA) [23][24][25] is a search heuristic that mimics the process of natural evolution, this heuristic is routinely used to generate useful solutions to optimize and search problems.In GA, the evolution usually starts from a population of randomly generated individuals and happens in generations.In each generation, the fitness of every individual in the population is evaluated, multiple individuals are stochastically selected from the current population and modified to form a new population.The new population is then used in the next iteration of the algorithm.Commonly, the algorithm terminates when either a maximum number of generations has been produced, or a satisfactory fitness level has been reached for the population.If the algorithm has terminated due to a maximum number of generations, a satisfactory solution may or may not have been reached.
GA is considered as one of the modern optimization algorithms to solve the combinatorial optimization problem and is used to determine the parameters of model BN-KRR.Based on the survival and reproduction of the fitness, GA is continually applied to get new and better solutions without any pre-assumptions, such as continuity and unimodality [26][27][28].The proposed model BN-KRR has been implemented in Matlab 7.8 programming language.The experiments are made on the 8.0 GHz Core (TM) i7-4790 CPU personal computer with 3.60 GB memory under Microsoft Windows XP Professional.The initial parameters of GA are Max − cgen = 100, C ∈ [1, 201], u, v ∈ (0, ∝).Many practical applications display that polynomial and Gaussian kernels perform well under general smooth assumptions [29].This work, polynomial, and Gaussian kernels can be used as the kernel for models ν-SVR, GN-KRR, and BN-KRR: ), where d is positive integer, and let d = 1, 2, or 3. σ is positive, and take σ = 0.2.As we all know, no prediction model forecasts perfectly.There are also certain criteria, such as mean absolute error (MAE), the root mean square error (RMSE), mean absolute percentage error (MAPE), and standard error of prediction (SEP) are used to evaluate the predictive performance of models ν-SVR, GN-KRR, and BN-KRR.The four criteria are defined as follows: where l is the size of the selected samples, m, i is the measured result of data-point x i , and p, i is the predictive result of data-point x i (i = 1, 2, • • • , N) [14][15][16].

Short-Term Wind Speed and Wind Power Forecasting with Real Data-Set
The model BN-KRR is applied to the multi-factors actual data-set for wind speed sequence prediction from Jilin Province.The wind speed data contain more than a year of samples which are collected in intervals of ten minutes, and the number of wind speed data is 62,466.Each column attribute is mean, variance, minimum, and maximum, respectively.The short-term wind speed forecast is studied as follows.
Suppose the training sample number is 2160 (from 1 to 2160 for 15 days), and the number of test samples is 720 (from 2161 to 2880 for 5 days).The input vector is − → x i = (x i , x i+1 , x i+2 , • • • , x i+11 ), the output value is x i+11+step , and step = 1, 3. Namely, the pattern above is used to forecast the wind speed each interval of 10 and 30 min at each Point x i+11 , respectively [30,31].
1. Forecast wind speed at point x i+11 each interval of 10 min The short-term wind speed sequence forecast results at point x i+11 each interval of 10 min given by GN-KRR [7,8,32], ν-SVR [33,34], and BN-KRR are illuminated with Figure 3 MAE, MAPE, RMSE, and SEP indicators are used to evaluate the prediction results of the three models at point x i+11 each interval of 10 min shown in Table 1.The short-term wind power sequence forecast results at point x i+11 each interval of 30 min given by GN-KRR, ν-SVR, and BN-KRR are illuminated with  MAE, MAPE, RMSE, and SEP indicators are used to evaluate the prediction results of the three models at point x i+11 each interval of 30 min shown in Table 2.The results of wind speed forecasting experiments indicate that BN-KRR has better performance than GN-KRR and ν-SVR in 10-min and 30-min short-term wind speed forecasting.
We have predicted the short-term wind speed from the Jilin Province wind farm, so we can calculate the wind power according to the Formula (25): where v cut−in and v cut−out represent cut-in wind speed and cut-out wind speed of wind turbine, respectively.v r and P r represent rated wind speed and rated power of wind turbine, respectively.The predictive wind speed is substituted into the Formula (25), we can obtain the predicted wind power.

Conclusions and Future Work
In this work, we propose a new version of kernel ridge regression model based on the Beta-noise (BN-KRR) to predict the uncertainty system of Beta-noise.Novel results have been obtained by the use of the model BN-KRR, which takes the Bayesian principle to Beta-noise empirical risk loss and improves the prediction accuracy.The numerical experiments are carried out on real-world data (the short-term wind speed).Comparing the model BN-KRR and models GN-KRR and ν-SVR by criteria MAE, MAPE, RMSE, and SEP verifies the validity and feasibility of our proposed model BN − KRR.Further, the forecasting results indicate that the proposed technique can obtain good performance on short-term wind speed forecasting.
In practical regression problems, data uncertainty is inevitable.The observed data are usually described in linguistic levels or ambiguous metrics, like the weather forecast, the forecast results of dry and wet, or sunny and cloudy, and so on.We should consider developing fuzzy kernel ridge regression algorithms with different noise models.
We verify the validity and feasibility of the model.