Short Term Wind Power Prediction Based on Data Regression and Enhanced Support Vector Machine

: This paper presents a short-term wind power forecasting model for the next day based on historical marine weather and corresponding wind power output data. Due the large amount of historical marine weather and wind power data, we divided the data into clusters using the data regression (DR) algorithm to get meaningful training data, so as to reduce the number of modeling data and improve the e ﬃ ciency of computing. The regression model was constructed based on the principle of the least squares support vector machine (LSSVM). We carried out wind speed forecasting for one hour and one day and used the correlation between marine wind speed and the corresponding wind power regression model to realize an indirect wind power forecasting model. Proper parameter settings for LSSVM are important to ensure its e ﬃ ciency and accuracy. In this paper, we used an enhanced bee swarm optimization (EBSO) to perform the parameter optimization for LSSVM, which not only improved the forecast model availability, but also improved the forecasting accuracy.


Introduction
Many countries' prediction of wind power generation started as early as 1990, and some developed countries such as Denmark, the United States, Spain, etc., have begun operating wind power prediction systems [1][2][3]. The wind forecasting software Prediktor developed by the Riso National Laboratory in Denmark has already carried out short-term wind power forecasting in Denmark, Spain, Ireland, and Germany [1], while Previento, the wind power forecasting software developed by the University of Oldenburg [4], uses physical models in relatively wide areas to achieve wind power forecasting two days in advance [5]. The wind power management system developed by Germany's Solar Energy Research Institute contains an artificial neural network model that can predict wind power output through data from the German Meteorological Agency [6]. The wind power forecasting software eWind applies statistical models and combines high-resolution mesoscale numerical weather models [7]. In addition, LocalPred [8], Zephry [9], and HIRPOM [10] have also developed independent prediction systems.
The physical method is based on numerical weather prediction (NWP), which records the operation of the wind turbine itself, including the wind speed, wind direction, and set altitude [11][12][13]. The combination of physical information and the power generation curve is used to simulate the actual wind farm power generation [14]. Since there is no historical data background, although suitable for newly-built wind farms, data sampling is extremely difficult, and the monitoring equipment also relatively increases the system's construction cost. The statistical method does not consider the process of physical changes between wind speeds. It uses historical data to construct models corresponding to wind speed or wind power generation and constructs statistical models through parameter estimation or functions such as the Markov chain [15], regression analysis [16], the exponential smoothing method [17], the Kalman filtering method [18], the ARMA [19] model, etc. Among them, the ARMA (p,q) model, which is a commonly-used statistical model, presents high-precision analysis. However, due to the uncertain natural climate and strong intermittency of the wind, it is difficult to predict a fixed time series that can adapt to diverse wind fluctuations. Therefore, statistical models are susceptible to regional restrictions; however, compared with physical models, the construction of statistical models based on large amounts of data is easier to apply. Deep learning methods use artificial intelligence to describe the nonlinear relationship between input and output. Common methods, such as neural networks [20], wavelet analysis [21], and support vector machines [22,23], improve the accuracy and adaptability of the model by correcting errors.
The BSO algorithm is modified by referring to the genetic algorithm (GA) and particle swarm optimization (PSO) for strengthening the optimization of parameter control and population evolution. The BSO has many of the advantages of biological intelligence in searching, but it has the shortcoming of easy and rapid convergence in computation and poor stability in higher dimensional search. In optimization algorithms, there is a necessity for new algorithms that can improve the performance of the existing algorithms while enhancing bee swarm optimization to perform the parameter adjusting approach, which has an important ability in improving the performance of the BSO. In [20][21][22][23], it is noted that accurate wind forecasting is crucial to have a reliable power system. However, the intermitted and unstable nature of the wind speed makes it is very difficult to accurately forecast the power generated. The objective of this paper is to exploit a novel method composed by data regression and an enhanced support vector machine to forecast wind power. The proposed model was applied to a case studies in Yunlin, and results are according to reality. The major contributions of this research are as: (i) the advantage of the proposed method can successfully improve the forecasting accuracy of wind power; (ii) the proposed model can maximize the power captured, thus increasing the reliability of wind power for wind farms.

Model Structure Optimization
The proposed wind power modeling data set for forecasting wind power generation is shown in Figure 1, and the proposed data preprocess algorithm is discussed in this section. It notes the four seasons wind speed and wind power in a year.

Data Preprocessing
The wind power's features and targets were normalized by min-max normalization. The inputs to the prediction model are shown in Figure 2. Three styles of inputs were present to the prediction model: (i) wind speed, (ii) temperature, and (iii) humidity. According to the prediction method, the generally used wind farm data selecting schemes were classified into two kinds, as described below.

Wind Speed Prediction Process
Wind is strongly intermittent, and although making predictions is an extremely difficult problem, it is inextricably linked to the preparatory work for the construction of wind power prediction models. In this study, the input variables were temperature, humidity, and wind speed, while the output variables were the wind speed value in the next period. The support vector machine was used to make one-hour wind speed predictions and one-day wind speed predictions. Figure 2 shows the construction process of the wind speed prediction of the proposed model.

Construction of Power Generation Model
The actual power generation value often has a point where the power generation amount fluctuates greatly or a point where an abnormal value is caused by improper measurements of the instrument [19]. These power generation amounts act like "noise", and this paper could perform data regression analysis twice. The cluster points constructed for the first time were used to explore all historical data, and the second time was based on the first cluster point to filter the data. Of the generated data, 20% of the data generated by each cluster distance from the cluster was deleted. The reference materials included in the data regression analysis were used to construct a more complete clustering point using the remaining data. The wind power process is given in Figure 3.

Wind Power Generation
To capture the maximal wind energy, it is necessary to install the power electronic devices between the wind turbine generator and the grid in a location where the frequency is constant [24]. For a variable speed wind turbine, the output mechanical power of the wind turbine (P m ) can be described as: where ρ is the density of the air around the turbine, A is the turbine's cross-sectional area, and V ω is the wind velocity in m/s. In addition, C p is the power coefficient. The tip speed ratio (TSR) λ is found as: where r is wind turbine blade tip radius and ω r is the turbine angular speed. For a variable pitch angle wind generation system, the blade pitch angle is β, and the TSR and power coefficient are defined as: ( By Equation (3), the power coefficient curve was as shown in Figure 4. In a wind turbine, there is an optimum value of the tip speed ratio, which leads to the maximum power coefficient.

Performance Measurements
The mean speed of the wind (VMED), mean absolute error (MAE), root mean squared error (RMSE), and mean absolute percent error (MAPE), are widely used to assess the performance of wind power forecasting [25]. VMED is the average wind speed as follows: MAE presents the accuracy in the same data units, which assists to conceptualize the magnitude of the error. It is represented in the following equation: where N is the number of samples, V real i is the actual speed, and V prev i is the predicted speed. RMSE is easier to explain as it is indicated in the same units as the forecasted variable: MAPE presents the accuracy as a percentage error. Because this number is a percentage, it may be easier to understand than other measure formulas. It is represented in the following equation:

Proposed Bee Swarm Optimization with Support Vector Machine Algorithm
Application of bee swarm optimization combined with data regression and least square support vector machine (DR-SVM) is an artificial intelligence pattern identification algorithm [22,23]. It is a new machine learning method based on the statistical learning theory. It has many unique advantages in solving problems due to a small sample size, as well as nonlinear and high-dimensional pattern recognition problems. It is used in practical problems such as handwriting recognition software, face recognition, and image classification. The bee swarm algorithm is a new optimization algorithm that can solve numerical optimization problems accurately and quickly, and which has the advantages of a simple concept, easy implementation, and fewer control parameters [26,27].

Least Square Support Vector Machine
In the field of artificial intelligence, support vector machines (SVM) are an artificial intelligence method that was proposed by Vapnik and AT&T laboratories in 1995 [28,29], in which the main theory is to use the structural risk minimization method in statistical learning. Support vector machines mainly use the separating hyperplane to distinguish two or more different types of data. In dealing with the problem of data exploration and classification, they have been used in many fields in recent years and have provided good results. Numerous scholars have studied and improved other methods, thereby making the application more extensive and the results more accurate.
In a typical classification problem, the following basic representation methods are usually defined: x i : A vector used to describe the style or attributes of a piece of data.
x i ∈ R n , i = 1, 2, 3 · · · m. y i : The target, usually expressed as {±1} (assuming the target is divided into two categories), in which +1 and −1 indicate two different categories.
The support vector machine is mainly used in classification and regression analysis. The proposed method uses its support vector classification and regression capability to simulate the fault location, type, and distance. The principle of the support vector machine is shown in Figure 5. The low-dimensional data of the input space are converted into a high-dimensional feature space. The data, which originally needed to be processed by a nonlinear curve, become a linear concept that can be easily processed. The support vector machine used in the proposed method is the least square support vector machine (LSSVM), which has seen improvements in recent years. Its parameter setting is less than that of the traditional support vector machine, and there is no need to set the insensitivity parameter of ε. It is only necessary to set the adjustment parameter γ and radial core function K. Support vector classification (SVC) mainly uses the hyperplane to separate data into two or more different categories, because the actual data may be high-dimensional data, and the hyperplane means the plane in high dimensions. As shown in Figure 6, linear classification is mainly used to find the hyperplane in the feature space to maximize the distance of the training data and then divide the data into two or more different types of data. It can be assumed that the hyperplane is f (x) = w T x + b = 0, w T ∈ R n , and is expressed as a vector used to describe each data characteristic attributes, b ∈ R [22,23]. Suppose that the training data set is T r = (x i , y i ) l i=1 , in which x i ∈ R n represents an n-dimensional input vector and y i ∈ {−1, +1} represents two types of data. The hyperplane can be used to classify the data set, and the maximum distance between the hyperplane and the two boundaries is called the maximum margin hyperplane.
The training data on the boundary are called the support vector (SV), and ρ is the maximum boundary. The classification curve is as follows: If y i = ±1, can be expressed as: If the training data satisfy the above formula, the maximum boundary can be obtained as follows: From the above formula, the target letter can be obtained: Since the objective function is a quadratic optimization problem, the Lagrange problem can be obtained from Equations (9) and (10) as follows: Here α = [α 1 , α 2 , · · · α l ] T is the Lagrange multiplier of the vector. According to the Karush-Kuhn-Tucker (KKT) conditions, the following formula can be obtained: Substituting Equation (13) into Equation (12) obtains the objective function: The support vector is the training data that satisfy the KKT condition, and these points will fall on the maximum boundary. After these support vectors are obtained, new data can be classified by the support vector. The mathematical concept of support vector regression (SVR) is as follows: the known training data set is T r = (x i , y i ) l i=1 , x i ∈ R n for input data, and y i ∈ R n output data. The linear regression objective function can be expressed as: Adding Equation (14) to the error function containing ε insensitivity results in the following equation: Introducing two slack variables, one for the target value higher than ε and the other for the target value lower than ε, results in: .
From Equation (17), the following equation can be obtained: From Equations (15), (16), and (18), the original optimization problem is obtained: In which the smaller range of the error term ξ, η is better. The adjustment parameter γ of the support vector machine represents the penalty weight for the error data of the support vector machine. In order to avoid the occurrence of over-matching, the γ parameter setting is particularly important. The output of the upper and lower measurement limits of the relaxation variable is shown in Figure 7.
The support vector machine used in the proposed method is the least square support vector (LSSVM). The support vector machine has hyperplane parameter settings in the application, which are greatly affected by the number of training data, leading to the problem of an excessively large solution range.
The least square support vector machine mainly sets parameter ε to 0, making each item of data a support vector point to find the regression curve. Because the support vector machine solves the quadratic programming problem, the variable dimension is equal to the number of training data items; therefore, the number of matrix elements is the square of the amount of training data. When the scale of the data reaches a certain level, traditional mathematical rules will become difficult to deal with.
The LSSVM method solves the linear equations to achieve the final regression curve, which reduces the difficulty of the solution to a certain extent and improves the speed of the solution. This paper used the radial core function K(x, y) = e (−σ 2 x−y 2 ) , and used the regression and prediction capabilities of LSSVM to construct the correlation between the data regression and the forecast. After selecting the modeling data through data regression, the support vector machine was used for training and modeling to realize the state structure of the predicted wind speed modeling. In terms of predicting the wind speed, the parameters of the input layer included temperature, humidity, wind speed, etc., and the least square support vector machine was used to predict the wind speed at the output layer. In terms of the power generation model construction, the input layer parameter was the wind speed, and the LSSVM was used for the output power generation. The wind power generation forecast error was normalized according to the actual wind speed interval and the capacity of the wind farm at the time of forecast, so as to reflect the true forecast accuracy.

Enhanced Bee Swarm Optimization Algorithm
In recent years, biological group intelligence has become one of the main fields of scientific research, and many optimization algorithms have been developed based on the concept of group intelligence, including the ant colony optimization algorithm, particles swarm optimization, and the bee swarm optimization; the concept of group intelligence is still widely used. The bee swarm algorithm can accurately and quickly solve numerical optimization problems by referring to the bee colony's division mode and type. It has simple concepts, easy implementation, fast convergence speed, few parameter settings, and a wide search range [26,27].
The groups in the bee swarm algorithm are represented as worker bees, follower bees, and scouting bees. Improvements to the working model include: each bee colony determines its number first, unlike the artificial swarms of bees except the worker bees, and the groups will only join when needed, so as to ensure that all groups can participate in the overall search process. The random change of search direction was added to the working mode of each ethnic group to avoid falling into the local optimal solution. The working mode of each ethnic group was as follows: (1) Worker Bee Working Mode The worker bee will search according to the current location, as shown in Equations (20) to (24). In addition, in order to avoid search areas that can be ignored, when the randomly generated rand value is greater than the set pr value, a reverse search is performed: where j: The number of worker bee populations x j,old : The old nectar location.
x j,new : The new nectar location. w b : The worker bee's own cognitive ability. w g : The group cognitive ability of the worker bees. r b , r g : A random number between 0 and 1. P j,best : The best search location for the worker bees themselves.
G best : The best search location for the worker bee colony. sign: A reverse search judgment. t max : The maximum number of searches.
A certain number of worker bees in the worker bee colony executes Equation (25); randomly selects two groups, m 1 and m 2 , from the solution of all bee colonies; and judges the working mode according to their fitness values, so that the worker bees will not be completely based on the best solution. The search can also be performed according to the second-best solution to increase the group diversity, assuming that this certain number is 30% of the number of worker bee groups: (2) Follower Bee working mode The follower bees also use Equation (26) as a criterion to determine whether to follow the worker bees for food. They are not added to the worker bee group in the enhanced bee swarm algorithm, as they are only followers. In the follower bee working mode, a reverse search is also added to increase the search area: where k: The number of follower bee groups; x t e : A position randomly selected by the follower bee according to Equation (26) at the t-th iteration. (

3) Scouting Bee Working Mode
The scouting bee is no longer an unfounded random search, and its working mode is changed to generate a new position after comparing the global best solution found by the t-th iteration with the average value of all group positions: where s: The number of scouting bees.

Wind Power Forecasting Process
The wind power forecasting process is mainly divided into three parts. The flow chart is shown in Figure 8.

•
Part 1: Apply data regression analysis to select the modeling data. • Part 2: Apply the bee swarm algorithm to solve the best setting parameters for the support vector machine.
As the support vector machine parameter selection has a great influence on the regression analysis, the parameter selection will often vary in different cases; therefore, the proposed method used the data regression support vector machine as the main body combined with the bee swarm algorithm, which was referred to as Enhanced Bee Swarm Optimization with Least Square Support Vector Machine (EBSO_LSSVM). The bee swarm algorithm was used to find better setting parameters for γ and σ in the support vector machine, and resulted in better training parameters, thus improving on the traditional shortcomings of unclear support vector machine parameter selection. The EBSO_LSSVM system architecture is shown in Figure 8.

•
Part 3: Use EBSO_LSSVM to perform wind speed prediction.

Simulation Results and Analysis
The Central Weather Bureau data of wind power from December 2018 to November 2019 were taken from Yunlin County in Taiwan and were implemented using the computer CPU of the Intel Core i7-3770k 3.5 GHz and RAM of 8 GB for the MATLAB R2015a (Santa Clara, CA, USA). The data were publicly available for researchers on the Central Weather Bureau Automatic Weather Station's website.
The DR-SVM model was compared with three other models, namely the conventional radial basis function (RBF) [30], SVM [31], and DR-RBF, for wind power prediction. In the case of the wind speed, data regression combined with the EBSO_LSSVM model was used for the forecast [32,33].

Short Term Wind Power Forecasting (Hours)
The generated power varied according to the expected wind speed for the wind farm. For this, the wind farm can generate power up to 45 MW. The expected short-term generated power results are presented in Figure 9. The variation in the power generation prediction behavior was smoother than that in the ultra-short prediction, but there was still a variation of approximately 10 MW at a range of eight to ten hours. The average output power of the proposed algorithm in the four seasons was better than that of the other methods, and the wind power in winter was higher compared to the other seasons. Furthermore, the DR-SVM model returned a higher accuracy than the other models.

Ultra-Short-Term Wind Speed Forecasting (10 min)
The results of display the speed value in m/s prediction for the ultra-short term forecasting of 10 min are shown in Figure 10. As seen in Figure 10, there was little difference between the DR-RBF model and the DR-SVM model. The forecast was made from January 2019 to October 2019 during all four seasons. For the ultra-short term, the proposed DR-SVM model had better results than the other models. The momentary, randomness, and non-smooth aspects of the short-term wind speed allocation were significantly reduced, which was of great advantage to the improvement of the forecast accuracy. The proposed method could effectually combine the merits of different methods to effectually enhance the wind speed forecast accuracy.

DR-SVM Performance Evaluation
Four evaluation indicators were used for the performance evaluation of the wind power forecasting: VMED, MAE, RMSE, and MAPE, as shown in Table 1. MAPE, RMSE, and MAE have been widely used to evaluate the performance of wind power forecasting models. The results shown in Figures 9 and 10 and in Table 1 were taken during a 24-h period in each season, i.e., 21 January (winter), 6 May (spring), 22 July (summer), and 21 October (autumn). According to the deterministic 1-h ahead forecasting error, the forecasting error for each season and for the average results was reduced by 6.86%, 15.31%, 15.90%, 9.39%, and 11.86%, respectively. The results shown in Table 1 indicated that the forecasting accuracy of the DR-SVM model was significantly better than DR-RBF and SVM, and that SVM was significantly better than RBF. Taking into account the accuracy of DR-SVM in winter, the forecasting accuracy was improved by 41.16%, 17.15%, and 14.35% in MAPE, referring to the preceding three methods, respectively. Moreover, the average forecasting errors of the proposed DR-SVM are shown in Figure 11. It is clearly seen that the wind speed forecasting has errors in all four seasons. Table 2 shows the average errors in the values of predicted speed for all season ultra-short term forecast. It can be seen that DR-SVM improves the searching capability with the best possibility of guaranteeing a global optimum. From Table 2, the DR-SVM algorithm demonstrates better accuracy, while the VMED, MAE, RMSE, and MAPE are greater than those in RBF [30], SVM [31], and DR-RBF.  Figure 11. Seasonal ultra-short-term wind speed forecasting error. The simulation results show that the proposed method based on data regression and the enhanced support vector machine can effectually improve the prediction accuracy of short-term wind speed. Moreover, the usefulness of the proposed DR-SVM algorithm achieves both the objectives: improving wind power forecasting accuracy and reducing computational costs. The effectiveness of the algorithm is illustrated by performing optimization on some cases, and the results are compared to those in previous journals. Our results show that the proposed method is realizable, robust, and more valid than many formerly-established algorithms.

Conclusions
Wind energy is inexhaustible, and wind power can effectively slow down the consumption of energy resources. Its development trend is unstoppable. However, the existence of uncontrollable factors such as intermittent wind and random fluctuations represent a great challenge. Large-scale wind power grid integration will inevitably impact power system stability and cause the accurate forecasting of wind power to become a top priority. The accuracy of wind power forecasting depends on the method of constructing the forecasting model and the accuracy of weather forecasting. Faced with the advent of big data, the limited use of effective data can reduce resource consumption. Therefore, this paper proposed wind power forecasting based on data regression. Among the numerous artificial intelligence methods, support vector machines with their superior ability to process nonlinear characteristics were selected for regression.

Wind Speed Prediction
This paper used data regression to filter valid historical data so as to select useful similar data from the weather database, reduce the impact of invalid data, and effectively reduce the modeling time. This method was combined with the bee swarm optimization and support vector machine for prediction. It was used to adjust the relevant parameters to improve the prediction ability, so that the drastic data changes could be predicted more accurately, thus allowing it to effectively become a climate prediction model for wind power generation.

Construction of the Power Generation Model and Wind Power Forecast
Based on historical wind power data, this paper selected representative cluster power generation by data regression, clusters to a power generation for each wind speed interval of meter/second, and combined the EBSO and LSSVM. The construction of the power generation model not only effectively reduced the modeling time, but also increased the accuracy of the model in comparison to that of the LSSVM model. In terms of the wind power prediction, combined with the results of wind speed prediction, and using the power generation corresponding to the power generation model to obtain wind power prediction, the discussion and conclusions of the case analysis provided better prediction results.