Short-Term Wind Power Forecasting Based on Feature Analysis and Error Correction

: Accurate wind power forecasting is an important factor in ensuring the stable operation of a power system. In this paper, we propose a wind power forecasting method based on feature analysis and error correction in order to further improve its accuracy. Firstly, the correlation analysis is carried out on the features using the maximal information coefﬁcient (MIC), and the main features are selected as the model input items. Then, the two primary factors affecting wind power forecasting— the wind speed and wind direction provided by numerical weather prediction (NWP)—are analyzed, and the data are divided and clustered from the above two perspectives. Next, the bidirectional long short-term memory network (BiLSTM) is used to predict the power of each group of sub data. Finally, the error is forecasted by a light gradient boosting machine (LightGBM) in order to correct the prediction results. The calculation example shows that the proposed method achieves the expected purpose and improves the accuracy of forecasting effectively.


Introduction
As a result of the clean transformation of the global energy structure, the high-carbon emission power system led by traditional fossil energy will gradually disappear and be replaced by a clean power system led by renewable energy [1].Wind power generation, as a typical form of renewable energy generation, has developed rapidly in recent years.However, due to the randomness and uncertainty of wind speed, large-scale wind power grid integration endangers the smooth operation of power systems [2].Wind power forecasting can provide data support for the power grid, help the dispatching department to formulate power generation strategies, and eliminate the fluctuations caused by wind power grid-connection to the maximum extent.Therefore, accurate wind power forecasting is very important [3].
Many scholars at home and abroad have carried out related research on wind power forecasting.Wind power forecasting methods are mainly divided into physical methods and statistical methods [4].The physical method establishes a flow field model through meteorological and topographical data and creates a forecast in combination with numerical weather prediction (NWP), which is suitable for medium-and long-term wind power forecasting.The calculation process is complicated, and the data real-time requirement is high, so it is not applicable for short-term wind power forecasting [5].The statistical method is based on historical data and NWP data, and it forms a forecasting model by establishing the mapping relationship between input features and power to carry out wind power forecasting, which is more suitable for short-term wind power forecasting [6].At present, short-term wind farm wind power forecasting models mainly include single models and hybrid models.The traditional single short-term wind power forecasting method mainly includes the following: the time series method [7], the gray model method [8], and the Kalman filter method [9], etc.However, these methods are based on linear modeling and do not consider the uncertainty of wind power, resulting in large errors [10].With the continuous innovation and development Energies 2023, 16, 4249 2 of 24 of artificial intelligence, a variety of machine learning algorithms are used for wind power forecasting.Traditional machine learning algorithms such as a support vector machine (SVM) [11], the least squares support vector machines (LSSVM) [12], and other methods avoid falling into local optimal solutions, and the prediction accuracy is further improved.However, they are sensitive to parameter and kernel function selection, so the accuracy depends too much on the value of the parameter.Decision tree algorithms such as the Gradient Boosting Decision Tree (GBDT) [13], eXtreme Gradient Boosting (XGBoost) [14], and Light Gradient Boosting Machine (LightGBM) [15] have good performance for wind power forecasting, but the space complexity is too high, and it is easy to overfit.Moreover, they do not perform well in processing data with strong feature correlation, so they are more suitable for processing data with low correlation.As an important part of machine learning, artificial neural networks are also used for wind power forecasting, which are used to extract the strong relationship between input features and future wind power.Shallow neural networks such as Back Propagation (BP) [16], Radial Basis Function (RBF) [17], Elman [18], extreme learning machine (ELM) [19], etc., have achieved good results.However, they do not consider the time-dependent information in the sample information, cannot automatically extract deep features, and cannot handle large changes in the wind power time series, which affects the learning efficiency of the algorithm [20].Due to their strong ability of data feature extraction and fitting, deep learning methods have developed rapidly in recent years.Convolutional neural network (CNN) [21], traditional gated recurrent unit (GRU) [22], deep belief network (DBN) [23], short-term memory neural network (LSTM) [24,25], etc., have all been used for wind power forecasting.Among them, LSTM is a recurrent version of deep learning, which applies many temporal latent layers to effectively learn the strong temporal feature of wind power data [26], realizing the full utilization of longdistance time series information.However, LSTM only contains one-way information and the time correlation between data is not considered comprehensively.Therefore, the bidirectional long short-term memory neural network (BiLSTM), combining two groups of LSTMs with opposite directions, has emerged, which contains both historical and future information, and this network has been shown to improve the prediction accuracy [5].Therefore, this paper conducts power forecasting based on BiLSTM.
The single model has limitations [27], so the hybrid model emerges as the times require, giving full play to the advantages of each model to achieve the goal of complementarity.It mainly includes two combination methods.One is an overall combination based on model performance.By assigning different weights, the prediction results of multiple single models are linearly combined to further improve the prediction accuracy.For example, [28] combines four neural networks: BP, Elman, ELM, and generalized regression neural network (GRNN); [29] combines three deep neural networks: LSTM, DBN, and echo state network (ESN).However, there are too many selection methods for the basic single models and their corresponding weights, and it is difficult to determine whether the performance of multiple single models is complementary, resulting in excessive randomness of the results.The other is component forecasting based on data features, extracting similar data through feature engineering, to enhance the representativeness and adaptability of the model to specific situations through the refinement of data selection [30].The following methods are currently popular: the original data is denoised by wavelet decomposition [31], empirical mode decomposition (EMD) [32], ensemble empirical mode decomposition (EEMD) [33], variational modal decomposition (VMD) [34], etc., to be decomposed into components of different frequency bands, and appropriate models are used to predict each subsequence separately.Based on different features (such as weather conditions, wind conditions, output mode, power change trend, etc.), cluster analysis is carried out on the original data [35,36], and power forecasting is made for different types of data, respectively; a similarity calculation method is defined in units of days, and the similar day data are extracted for model training and forecasting [37].However, the above methods mostly extract feature similarity from the perspective of power output-ignoring the wind speed and wind direction, which are the two meteorological factors that have the greatest impact on future power-and have not excavated the strong correlation between the above two factors and power.Therefore, based on emphasizing the inseparable relationship between the above two factors and power, this paper focuses on extracting similar data from these two perspectives for component forecasting.
Table 1 shows the comparison of the above models.In addition to the above models, data pre-processing and post-processing are also effective ways to improve the accuracy of wind power forecasting.The pre-processing of the data mainly uses a maximum information coefficient (MIC) [26], principal component analysis (PCA) [17], conditional mutual information [38], etc., to perform correlation sorting, and to reduce the dimension of high-dimensional input features.By introducing as many features as possible, the forecasting model can reflect the impact of complex external conditions on wind power to some extent.However, the features with lower correlation have limited improvement on prediction performance.Traditional methods do not consider the different influence of the input features on the change of wind power in different wind farms, that is, different wind farms are applicable to different model input features.At the same time, when multiple original features are used as the input features of the prediction model, the complexity of the model will be increased, and the running speed of the model will be seriously reduced.In order to improve the above situations, it is necessary to have the corresponding algorithm to select several appropriate features from multiple original features as the input variables of the model.The relevant feature selection algorithm can not only screen out the features that have a greater impact on wind power and eliminate the coupling relationship between variables, but also help to improve the running speed of the model, which makes the prediction model achieve higher prediction accuracy and efficiency with less input.The post-processing of data is mainly to correct the error of the predicted wind power [39] by describing the distribution of the prediction error, ref. [40] by predicting the error, and to correct the preliminary prediction power value.Compared with the load forecast, due to the uncertainty of wind speed, the wind power forecasting result is more volatile and there is still a large error in the preliminary forecasting.The preliminary prediction result is substituted into the error model, and the error value obtained is correspondingly superimposed with the preliminary forecasting value as the prediction correction result, which plays the role of checking the leakage and filling it.At the same time, different types of models are selected for the preliminary forecasting and error correction, which can avoid the limitations of a single model to a certain extent and give full play to the advantages of different models to further improve the prediction accuracy.The error correction has strong generality and is not limited to the specific forecasting process.Therefore, the multi-link forecasting method of " input screening + model combination + error correction" has become a research hotspot and strives to maximize the prediction accuracy from the perspective of each link optimization.Based on linear modeling, the uncertainty of wind power is not considered, so the prediction error is large.
They are sensitive to the selection of parameters and kernel functions, and the accuracy depends excessively on the value of parameters.
The spatial complexity is too high, and it is easy to over fit, which is more suitable for processing data with low correlation.
Without considering the time-dependent information in the sample information, the deep features cannot be automatically extracted.
They have strong data feature extraction and fitting ability.However, only BiLSTM can mine bi-directional time series information.In summary, this paper proposes a short-term wind power forecasting method based on feature analysis and error correction.First, the MIC is introduced to analyze the correlation of multiple factors affecting wind power, and then feature selection is performed based on this.Second, the data are divided based on wind speed, and cluster analysis is conducted on the data from the aspect of wind direction.Then, each component builds a prediction model based on the BiLSTM to complete the preliminary forecasting of wind power.Finally, lightGBM is used to perform error forecasting and correct the preliminary prediction results.The results of practical examples show that the method proposed in this paper effectively improves the forecasting accuracy and establishes a hybrid forecasting framework including feature selection, similar data component extraction, model forecasting, and error correction.Our contributions are summarized as follows:

•
MIC is used for feature selection.Several factors with the strongest correlation with wind power are selected from multiple features, which avoids the interference of irrelevant features on the wind power prediction results, reduces the input features dimension, reduces the workload of neural network, and thus greatly improves the operation speed.In addition, the features most closely related to power can also be observed, thus laying a foundation for the extraction of similarity components.

•
A hybrid model based on component forecasting is used for wind power forecasting.The strong correlations between wind speed and power, wind direction and power are separately mined.The prediction components are extracted from the above two features for the first time, which makes the similarity between components stronger.The relationship between wind turbine output and wind speed is fully utilized, and the influence of wind blowing from different wind directions on wind energy absorbed by wind turbines is considered.This provides a new idea for improving the accuracy of wind power forecasting, which has not been studied in the previous literature.

•
Based on the forecasting model, an error correction method is proposed.Since the correlation between prediction error and input features such as wind speed and wind direction are smaller than the correlation between predicted power and the above factors, the data post-processing method uses the lightGBM algorithm to predict the error for the first time, which is more suitable for processing data with low correlation.At the same time, it can make up for the shortcomings caused by only using the BiLSTM model.The two algorithms with completely different principles complement each other, making the prediction results more accurate.
The rest of this paper is organized as follows: The basic principles and methods of MIC feature selection and the extraction of similar data components based on wind speed and wind direction are introduced in Section 2. The forecasting model and error correction model are established in Section 3. The example analysis is in Section 4. Finally, conclusions are drawn in Section 5.

Correlation Analysis Based on MIC
The MIC was proposed in 2011 to measure the degree of correlation between variables [41].Table 2 compares various correlation analysis methods.Traditional correlation analysis methods, such as linear fitting and Pearson et al., can only analyze the correlation between linear data, and Spearman can analyze the simple and monotonous nonlinear correlation on this basis.Therefore, their scope of application is small.Moreover, these methods have poor robustness.PCA and its improved algorithm kernel principal component analysis (KPCA) are two popular correlation analysis methods at present which enhance the ability of data processing.However, the former has difficulty dealing with data sets that are not linearly separable, and the latter is too dependent on the selection of kernel function parameters.MIC can not only describe linear and nonlinear (such as periodicity and sinusoidal) relations between variables but also tap the potential non-functional relations [42].Moreover, it shows good robustness to data containing noise, such as wind speed and wind power, and has low computational complexity.The basic principle of MIC is as follows: if there is correlation between two variables, the scatter diagram formed by them is drawn, and the corresponding mutual information (MI) value is calculated and regularized under different grid division methods, wherein the maximum MI value is the MIC of the two variables.
MIC is used to analyze the correlation between different features and wind power as follows: (1) Calculate the MI.Assume that the feature is X = {x i }(i = 1, 2, . . ., N); the wind power is Y = {y i }(i = 1, 2, . . ., N); N is the sample number.The MI value between X and Y is: where, p(x, y) is the joint probability density between X and Y; p(x) and p(y) are the marginal probability density of X and Y, respectively.
(2) Divide the grid.D = {(x i , y i }(i = 1, 2, . . ., N) is a binary data set composed of two variables.A scatter plot is formed, and the current two-dimensional space is divided into a grid of m × n, denoted as G(m, n).There are various ways to divide the grid of m × n, so the MI value calculated is different, and the largest MI value is selected: (3) Normalization.Normalized the maximum MI values under different grids: Energies 2023, 16, 4249 6 of 24 (4) Calculate the MIC.The MIC is: where, B(N) is the upper limit of grid division and will usually be selected as B(N) = N 0.6 .MIC values range from 0 to 1.The larger the MIC value is, the stronger the correlation is.Therefore, the features strongly correlated with wind power are selected as model input features.

Feature Analysis of Wind Speed 2.2.1. Relationship between Wind Speed and Wind Power
In short-term wind power forecasting, wind speed is the leading factor determining the output power of wind farms, which is an essential parameter for the energy system operation [43].The randomness and uncertainty of wind speed make wind power have the same characteristics, so full analysis of wind speed can greatly reduce the wind power forecasting error.Therefore, wind speed is selected as the primary feature representative of power forecasting.As the biggest factor affecting wind power, there is a strong correlation between them.Figure 1 shows the relationship between wind speed and wind turbine output.The theoretical relationship between them is as follows: where, P(v) is the theoretical output of wind turbine, kW; P r is the rated power, kW; v is the real-time wind speed, m/s; v in , v r , and v out are the cut-in wind speed, rated wind speed, and cut-out wind speed, respectively, m/s; C P is the wind energy utilization coefficient; ρ is the air density, kg/m 3 ; A is the swept area by the wind wheel, m 2 .NWP is the main source of wind speed forecasting data.Based on the above analysis, the data are divided according to NWP wind speed to improve the correlation.The partition nodes are cut-in wind speed, rated wind speed, and cut-out wind speed.Therefore, the raw data are divided into three categories: is the rated-power group.
Due to the existence of certain errors in NWP wind speed data, in order to improve accuracy, this paper further divides the wind speed according to the distribution law and considering the wind speed fluctuation.The change trend of wind speed in a certain period is analyzed in order to reserve a certain margin at each partition node and reduce the error.The wind speed fluctuation is depicted as follows: ( ) NWP is the main source of wind speed forecasting data.Based on the above analysis, the data are divided according to NWP wind speed to improve the correlation.The partition nodes are cut-in wind speed, rated wind speed, and cut-out wind speed.Therefore, the raw data are divided into three categories: 0 ≤ v < v in or v ≥ v out is the zero-power group; v in ≤ v < v r is the standard-power group; v r ≤ v < v out is the rated-power group.
Due to the existence of certain errors in NWP wind speed data, in order to improve accuracy, this paper further divides the wind speed according to the distribution law and considering the wind speed fluctuation.The change trend of wind speed in a certain period is analyzed in order to reserve a certain margin at each partition node and reduce the error.The wind speed fluctuation is depicted as follows: Energies 2023, 16, 4249 where, v t is the NWP wind speed at time t; T is the time resolution; v t−3T , v t−2T , and v t−T represent the NWP wind speed at 3T minutes, 2T minutes, and T minutes before time t, respectively; C t describes the changing trend of wind speed at time t, when C t > 0, indicates that wind speed shows an upward trend, and the partition nodes can be appropriately reduced; when C t < 0, indicates that the wind speed decreases, and the partition nodes can be appropriately improved.

Wind Speed Probability Distribution Model
Wind speed generally conforms to the two-parameter Weibull distribution, and the cumulative distribution function is as follows: The probability density function is: where, v is the wind speed, m/s; V is the given wind speed, m/s; k is the shape parameter; c is the scale parameter.When a local wind condition has multi-peak characteristics, it can be fitted with a mixed Weibull.The mixed Weibull belongs to a weighted model, which is composed of multiple Weibull based on weights.The cumulative distribution function is as follows: where, n is the number of Weibull; ω i is the weight of the ith Weibull; k i is the shape parameter of the ith Weibull; c i is the scale parameter of the ith Weibull.
The probability density function is: Maximum likelihood estimation is a commonly used method with high accuracy in Weibull fitting [44].This paper performs Weibull fitting on the wind speed of the wind farm.According to the contents of the previous section, when wind speed shows an upward trend, on the cumulative distribution curve of wind speed, each partition node becomes the corresponding wind speed when the probability of the current cumulative distribution decreases by 2%, denoted as v

Feature Analysis of Wind Direction
In order to obtain more abundant wind resources, maximize the capture of wind energy, and reduce the influence of tower shadow effect, wind farms generally adopt the strategy of deploying upwind wind turbines in the prevailing wind direction [45].As a result, wind turbines have different abilities to capture wind blowing from different directions.This is because, although wind turbines are generally equipped with a yaw system, there is a delay effect in the control process and, due to the different wind paths, the ground roughness and obstacles are different, resulting in different wind speeds reaching the wind turbine.Therefore, wind direction is also one of the important features affecting wind power and must be considered in power forecasting.

Feature Analysis of Wind Direction
In order to obtain more abundant wind resources, maximize the capture of wind energy, and reduce the influence of tower shadow effect, wind farms generally adopt the strategy of deploying upwind wind turbines in the prevailing wind direction [45].As a result, wind turbines have different abilities to capture wind blowing from different directions.This is because, although wind turbines are generally equipped with a yaw system, there is a delay effect in the control process and, due to the different wind paths, the ground roughness and obstacles are different, resulting in different wind speeds reaching the wind turbine.Therefore, wind direction is also one of the important features affecting wind power and must be considered in power forecasting.
Clustering can group data points into different clusters based on their similarity or density, automatically extracting intra-class similarities as well as inter-class differences [46].In general, the analysis of wind direction is performed mostly by drawing a wind direction rose chart.This meteorological tool can intuitively describe the wind direction characteristics of a certain area.The polar coordinates of the wind rose chart are generally divided into 4, 8, or 16 wind directions.At the same time, considering that with the increase in the number of clusters the calculation is more complex, but the improvement of the clustering performance is limited, the Elbow's method is generally used to determine the optimal number of clusters [47].The Elbow's method selects the optimal number of clusters by calculating the Sum of Squared Error (SSE) under different cluster number K. As K continues to increase, SSE will gradually decrease.During the change of SSE, there will be an inflection point, called "elbow" point, at which the decline rate suddenly slows down, and the corresponding K value is considered as the optimal number of clusters.We drew inspiration from the wind direction division rules of the wind rose chart and combined it with the Elbow's method to determine the cluster number to be 4. Therefore, only four directions subsets are involved in this paper.Cluster analysis method is used to divide the standard-power group data into four categories from the aspect of wind direction: north, east, south, and west.Since the data of the zero-power group and the rated-power group are less, and their power is mostly fixed at 0 or the rated power, the correlation between power and wind speed is much greater than that between power and wind direction, so only the wind direction clustering is performed on the data of the standard-power group.In addition, due to the large amount of data, the k-means clustering algorithm, which is most typical, is selected for cluster analysis.
The initial clustering center of k-means algorithm needs to be selected randomly or manually, and the quality of the initial clustering center greatly affects the clustering result.In order to ensure the accuracy of clustering, in this paper, the data of due north (348.75 • -11.25   ) are firstly selected.The fuzzy c-means algorithm (FCM), which is more suitable for processing less data, is used to select the clustering centers of the above data in four directions as the initial clustering centers of k-means to ensure that the clustering results contain wind direction information.

BiLSTM Model
The LSTM neural network constructs a larger deep neural network through complex nonlinear units, which can reflect the long-term memory effect and thus carry out deep learning.It is a special network in the recurrent neural network (RNN) [48].At each time compensation, combined with the output of the last point, content selection is realized through the gate structure [49].LSTM has three layers, and its basic network unit is shown in Figure 3.The basic unit contains input gate, forget gate, and output gate.x t is the input of the current layer; h t−1 is the output of the previous layer.After they pass through the forget gate together, they output any value between 0 and 1 to the old cell state S t−1 to decide the forgotten part."0" means "forget all"; "1" means "pass all".The x t in the input gate goes through the sigmoid and tanh functions, respectively, and the obtained results are multiplied to generate new information.Then, the new information plus the forgotten S t−1 , and their sum, is the new cell state S t .After S t passes through the tanh function, it determines the output together with o t .The calculation formulas are as follows: where, σ is the sigmoid function; ω = ×tanh( ) where, σ is the sigmoid function;  Wind power is a time series with strong temporal correlation.The wind power at a certain time is not only related to multiple NWP meteorological features, but also related to the historical or future wind power at an adjacent time or at the same time of different days.The temporal correlation characteristic of wind power mainly comes from several NWP meteorological features with strong correlation with it, such as wind speed and wind direction, whose influence on power is enough to cover the influence of most other NWP features.Taking NWP wind speed as an example, under normal circumstances, there is time series autocorrelation between the current wind speed and the historical or future wind speed at an adjacent time or at the same time of different days, so it shows a certain degree of temporal correlation characteristic.
LSTM has a certain memory function for historical data, that is, when forecasting the wind power ˆ( ) P t at time t, LSTM not only considers the influence of the input features at time t on the wind power prediction value ˆ( ) P t but also considers the influence of the prediction power The same is true of predictions at other times.Wind power is a time series with strong temporal correlation.The wind power at a certain time is not only related to multiple NWP meteorological features, but also related to the historical or future wind power at an adjacent time or at the same time of different days.The temporal correlation characteristic of wind power mainly comes from several NWP meteorological features with strong correlation with it, such as wind speed and wind direction, whose influence on power is enough to cover the influence of most other NWP features.Taking NWP wind speed as an example, under normal circumstances, there is time series autocorrelation between the current wind speed and the historical or future wind speed at an adjacent time or at the same time of different days, so it shows a certain degree of temporal correlation characteristic.
LSTM has a certain memory function for historical data, that is, when forecasting the wind power P(t) at time t, LSTM not only considers the influence of the input features at time t on the wind power prediction value P(t) but also considers the influence of the prediction power P(t − 1) at time t − 1.The same is true of predictions at other times.Therefore, in the wind power forecasting at time t, not only the current time information is used but also the long-distance historical information is used.However, it can only use the past information and transmit from front to back, which has limitations.BiLSTM solves these problems by using two positive and negative LSTMs.Based on these LSTMs, it also considers the influence of the prediction power P(t + 1) at the time t + 1 on P(t), which enables it to capture both historical and future information.In addition, the possible phase lag problem in one-way prediction is solved by it.Therefore, BiLSTM is more suitable for time series prediction, and shows better performance in the field of forecasting.Its output expression is: where, h t f and h tb are the forward and backward LSTM output vectors, respectively; concat is representing the splicing operation.
The BiLSTM network structure is shown in Figure 4.

LightGBM Model
LightGBM is a framework for the implementation of the GBDT algorithm, which is an improvement on the XGBoost algorithm.It has obvious improvement in performance, efficiency and running speed [50].LightGBM is one of the algorithms with excellent performance at present.
Compared with the traditional XGBoost algorithm, LightGBM mainly obtains more accurate information gain through the gradient-based one-sided sampling strategy (GOSS) and realizes feature dimensionality reduction by combining high-dimensional features through the mutually exclusive feature compression strategy (EFB), thereby reducing computational complexity and improving efficiency.In addition, the optimization also involves the following aspects: the decision tree algorithm based on Histogram reduces memory usage; the Leaf-wise growth strategy with depth limit deepens the decision tree; direct support for category features; cache hit rate optimization; histogram-based sparse feature optimization; multi-threading optimization; etc.

Establishment of Forecasting Model
BiLSTM has good processing performance for data with strong correlations, while the LightGBM algorithm is more suitable for processing data with weak correlations.Therefore, in this paper, BiLSTM is used to establish the nonlinear relationship between multi-dimensional features and wind power and to predict the wind power needed to obtain the preliminary predicted power.The LightGBM algorithm is used to forecast the error in order to correct the preliminary predicted power.The calculation method is as follows: Historical data prediction error:

LightGBM Model
LightGBM is a framework for the implementation of the GBDT algorithm, which is an improvement on the XGBoost algorithm.It has obvious improvement in performance, efficiency and running speed [50].LightGBM is one of the algorithms with excellent performance at present.
Compared with the traditional XGBoost algorithm, LightGBM mainly obtains more accurate information gain through the gradient-based one-sided sampling strategy (GOSS) and realizes feature dimensionality reduction by combining high-dimensional features through the mutually exclusive feature compression strategy (EFB), thereby reducing computational complexity and improving efficiency.In addition, the optimization also involves the following aspects: the decision tree algorithm based on Histogram reduces memory usage; the Leaf-wise growth strategy with depth limit deepens the decision tree; direct support for category features; cache hit rate optimization; histogram-based sparse feature optimization; multi-threading optimization; etc.

Establishment of Forecasting Model
BiLSTM has good processing performance for data with strong correlations, while the LightGBM algorithm is more suitable for processing data with weak correlations.Therefore, in this paper, BiLSTM is used to establish the nonlinear relationship between multidimensional features and wind power and to predict the wind power needed to obtain the preliminary predicted power.The LightGBM algorithm is used to forecast the error in order to correct the preliminary predicted power.The calculation method is as follows: Historical data prediction error: where, e(t − nT), P(t − nT), and P(t − nT) are, respectively, power prediction error, power forecasting value, and actual power at time (t − nT), MW; T is the time resolution; N is the historical data capacity.
where, X(t) is the input feature at time t, including weather features, preliminary predicted power, historical error data, etc. f (•) is the nonlinear relationship between the input features and the error prediction value.Therefore, the revised power prediction value is: The overall process of this paper is shown in Figure 5.The specific steps are as follows: (1) Feature selection.The MIC value between multi-dimensional features and wind power is calculated, and the features with strong correlation are selected as input features.
(2) Wind speed division.Calculate the fluctuation type of the wind speed of the original data, find the partition nodes based on the cut-in wind speed, rated wind speed, and cut-out wind speed, and divide the original data.(3) Wind direction clustering.FCM is used to select the clustering centers of the data with the wind directions of due north, due east, due south, and due west, respectively, as the initial clustering centers of the k-means algorithm.Then, k-means is used to cluster the data of the standard-power group into four categories of north, east, south, and west.(4) Power forecasting.For each type of data, a BiLSTM forecasting model is constructed to predict wind power, and the preliminary prediction results are sorted according to the time sequence.(5) Error correction.The LightGBM model is applied for error forecasting to correct the preliminary power forecasting results.
where, ( ) X t is the input feature at time t , including weather features, preliminary pre- dicted power, historical error data, etc. ( ) f  is the nonlinear relationship between the input features and the error prediction value.Therefore, the revised power prediction value is: The overall process of this paper is shown in Figure 5.The specific steps are as follows: (1) Feature selection.The MIC value between multi-dimensional features and wind power is calculated, and the features with strong correlation are selected as input features.

Case Study
This paper takes a wind farm (wind farm A) in Northwestern China and a wind farm (wind farm B) in North China as the research objects.The installed capacity of both wind farms is 45 MW.The data of the two wind farms from 1 May to 15 August 2018 are selected as relevant experimental samples, including the actual wind power data and a variety of historical mesoscale NWP meteorological data.The meteorological features include wind speed, wind direction, temperature, pressure, and humidity.The time resolution T = 10 min, and the data deviating from the actual situation are excluded.Among them, the average wind speed of wind farm A was relatively large, with relatively strong fluctuations; the average wind speed of wind farm B was relatively small, with relatively weak fluctuations.The data from a total of 92 days from May to July were selected as a training set, and data from the remaining 15 days were used as a testing set.

Feature Processing 4.1.1. MIC Feature Selection
The items listed in Table 3 are the alternatives for the input features.In order to select the optimal feature, the MIC value between features and the actual wind power were calculated, respectively, and several items with strong correlations were selected as the input features.The higher the MIC value, the stronger the correlation between its corresponding features and wind power.From Table 3, we observe that for wind farm A, among various NWP meteorological features, the MIC value between wind speed and actual wind power was the highest, followed by wind direction, temperature, pressure, and humidity, in turn.Three meteorological features with the strongest correlation were selected as input features for wind farm A, namely wind speed, wind direction, and temperature.For wind farm B, among various NWP meteorological features, the MIC value between wind speed and wind power was also the highest, followed by temperature, wind direction, pressure, and humidity, in turn.Three meteorological features with the strongest correlation were selected as input features for wind farm B, which were also wind speed, wind direction, and temperature.In addition, we also analyzed the correlation between the historical wind power data and the wind power at the current time.From the MIC value, we observed that there was a strong correlation between them, and the MIC value of the historical wind power was higher when it was closer to the forecast time.Therefore, we also considered historical wind power data as an input feature.Due to the limitation of time scale, when the time scale was 24 h, we additionally selected the wind power from the previous 24 h as the input feature; when the time scale was 12 h, we additionally select the wind power from the previous 12 h as the input feature; when the time scale was 8 h, we additionally select the wind power from the previous 8 h as the input feature.Figure 6 shows the wind power forecasting when the time scale is T .When the prediction time scale T = 24 h, after the training set was used to complete the model training, the 144 test samples of every 24 h (1 day) scale were selected to forecast the power through the trained network.The input time series were the NWP wind speed, wind direction and temperature every 10 min in the next 24 h, as well as the actual wind power series in the first 24 h of the corresponding test sample points.The output time series was the wind power prediction value every 10 min in the next 24 h, and the experimental content was repeated until the 15-day prediction task was completed.

Wind Speed Division Results
Figure 7 shows wind speed frequency histograms of two wind farms.The wind speed in wind farm A is bimodal distribution, so the mixed Weibull fitting composed of two Weibull distributions was used; the wind speed in wind farm B is unimodal distribution, so the single Weibull fitting was used.It can be intuitively seen that the wind speed distribution of wind farm A is relatively scattered, and the wind speed frequency of 3-8 m/s is similar; the wind speed distribution of wind farm B is relatively concentrated, and the wind speed frequency of 6-8 m/s is relatively high.Figure 8 shows the Weibull fitting results, where = .Since none of the selected historical data exceeds the cut-out wind speed, only the cut-in wind speed and the rated wind speed were considered when dividing the wind speed.Figure 9 shows the fitted wind speed cumulative distribution function.According to the rules in Figure 2

Wind Speed Division Results
Figure 7 shows wind speed frequency histograms of two wind farms.The wind speed in wind farm A is bimodal distribution, so the mixed Weibull fitting composed of two Weibull distributions was used; the wind speed in wind farm B is unimodal distribution, so the single Weibull fitting was used.It can be intuitively seen that the wind speed distribution of wind farm A is relatively scattered, and the wind speed frequency of 3-8 m/s is similar; the wind speed distribution of wind farm B is relatively concentrated, and the wind speed frequency of 6-8 m/s is relatively high.Figure 8 shows the Weibull fitting results, where c 1 = 2.946, c 2 = 5.721, k 1 = 2.295, k 2 = 2.632, ω 1 = 0.208, and ω 2 = 0.792 in wind farm A; scale parameter c = 2.165, shape parameter k = 9.156 in wind farm B. According to the actual situation of the wind farm, v in = 3 m/s, v r = 13 m/s and v out = 25 m/s.Since none of the selected historical data exceeds the cut-out wind speed, only the cut-in wind speed and the rated wind speed were considered when dividing the wind speed.Figure 9 shows the fitted wind speed cumulative distribution function.According to the rules in Figure 2 In wind farm A, when wind speed fluctuation shows an upward trend, 0 ≤ v < 2.808 m/s is the zero-power group, 2.808 m/s ≤ v < 12.149 m/s is the standard-power group, and v ≥ 12.149 m/s is the rated-power group; when wind speed fluctuation shows a downward trend, 0 ≤ v < 3.192 m/s is the zero-power group, 3.192 m/s ≤ v < 14.740 m/s is the standard-power group, and v ≥ 14.740 m/s is the rated-power group.In wind farm B, when wind speed fluctuation shows an upward trend, 0 ≤ v < 2.639 m/s is the zero-power group, 2.639 m/s ≤ v < 12.551 m/s is the standard-power group, and v ≥ 12.551 m/s is the rated-power group; when wind speed fluctuation shows a downward trend, 0 ≤ v < 3.323 m/s is the zero-power group, 3.323 m/s ≤ v < 13.510 m/s is the standard-power group, and v ≥ 13.510 m/s is the rated-power group.Figure 10 shows the changes of SSE with the value of K after the clustering of direction data of the two wind farms.When K = 4 for wind farm A and K = 3 for farm B, the SSE is greatly reduced, which represents that the distortion degree is g reduced and then slowly decreases.Combined with the drawing rules of the wind tion rose chart, the number of clusters is selected as 4. 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2

Wind Direction Clustering Results
Figure 10 shows the changes of SSE with the value of K after the clustering of wi direction data of the two wind farms.When K = 4 for wind farm A and K = 3 for wi farm B, the SSE is greatly reduced, which represents that the distortion degree is grea reduced and then slowly decreases.Combined with the drawing rules of the wind dir tion rose chart, the number of clusters is selected as 4. 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2

Wind Direction Clustering Results
Figure 10 shows the changes of SSE with the value of K after the clustering of wind direction data of the two wind farms.When K = 4 for wind farm A and K = 3 for wind farm B, the SSE is greatly reduced, which represents that the distortion degree is greatly reduced and then slowly decreases.Combined with the drawing rules of the wind direction rose chart, the number of clusters is selected as 4.  The data with the NWP wind directions of due north, due east, due south, and due west were selected, and the cluster centers of the data in each direction above were clustered by FCM.The results are shown in Table 4.The above four points were used as the initial clustering center of k-means.Through the k-means algorithm, all the data of standard-power group were clustered into four subsets of east, west, north, and south according to the wind direction.Figures 11 and 12 show the three-dimensional relationship between wind direction sine, wind direction cosine, and power before and after clustering, respectively.The data with the NWP wind directions of due north, due east, due south, and due west were selected, and the cluster centers of the data in each direction above were clustered by FCM.The results are shown in Table 4.The above four points were used as the initial clustering center of k-means.Through the k-means algorithm, all the data of standardpower group were clustered into four subsets of east, west, north, and south according to the wind direction.Figures 11 and 12 show the three-dimensional relationship between wind direction sine, wind direction cosine, and power before and after clustering, respectively.tered by FCM.The results are shown in Table 4.The above four points were used as the initial clustering center of k-means.Through the k-means algorithm, all the data of standard-power group were clustered into four subsets of east, west, north, and south according to the wind direction.Figures 11 and 12 show the three-dimensional relationship between wind direction sine, wind direction cosine, and power before and after clustering, respectively.

Data Normalization
The original data are normalized as follows: The predicted power is reversely normalized as follows: where, n x is the normalized data; x is the actual value; min x and max x are the minimum value and maximum value of each type of data, respectively.

Evaluation Index of Forecasting Effect
The fitting degree between the predicted power and the actual power is the criterion for evaluating the goodness of the forecasting model.In this paper, the mean absolute error (MAE), root mean square error (RMSE), and mean relative error (MAPE) of the results are calculated as follows:

Data Normalization
The original data are normalized as follows: The predicted power is reversely normalized as follows: where, x n is the normalized data; x is the actual value; x min and x max are the minimum value and maximum value of each type of data, respectively.

Evaluation Index of Forecasting Effect
The fitting degree between the predicted power and the actual power is the criterion for evaluating the goodness of the forecasting model.In this paper, the mean absolute error (MAE), root mean square error (RMSE), and mean relative error (MAPE) of the results are calculated as follows: where, c i and y i are predicted value and true value, respectively; n is the sample size.

Analysis of Wind Power Forecasting Results
In order to verify the effectiveness of the proposed method, this paper conducts the following three rounds of forecasting.The time scale of the first and second rounds of forecasting is 24 h, and the time scale of the third round of forecasting includes 24 h, 12 h, and 8 h.

Basic Single Model Forecasting
In order to verify the accuracy of the basic model adopted in this paper, first, the single BP, RBF, Elman, LightGBM, LSTM, and BiLSTM neural networks models were used to predict the unclassified original data.For the BP neural network, the maximum iteration number is 200, the number of hidden layer nodes is 5, the learning rate is 0.1, and the learning objective is 0.00004.For the RBF neural network, the spread is 20, and the number of neurons is 10.For the Elman neural network, the hidden layer has 11 neurons, and the maximum iteration number is 2000.The parameters of LSTM are shown in Table 5.The parameters of unidirectional LSTM in BiLSTM are the same as those of LSTM.Table 6 shows the forecasting results of the above models, and Figure 13 shows the comparison of the results of the above models.Table 7 shows the prediction results of a single BiLSTM for wind farm A when the parameters of LSTM are changed, from which we can know that the accuracy is higher under the parameters selected in the Table 5.Combined with Figures 7 and 13, we observe that the wind speed of wind farm A changed frequently due to the relatively scattered distribution of wind speed, and then showed the characteristics of relatively large fluctuation in power generation; wind farm B had a relatively concentrated wind speed distribution, which resulted in less frequent changes in wind speed, and then showed a low fluctuation in power generation.From Figure 13, we observe that BiLSTM had the best prediction performance among the six basic models, followed by LSTM.For wind farm A, compared with RBF model with a large error, the MAE, RMSE, and MAPE of BiLSTM were reduced by 0.9141 MW, 1.1711 MW, and 12.180%, respectively; when compared with LightGBM, the MAE, RMSE, and MAPE of BiLSTM were reduced by 0.0891 MW, 0.7785 MW, and 6.585%, respectively; and when compared with LSTM with better forecasting results, the MAE, RMSE, and MAPE of BiLSTM were reduced by 0.1026 MW, 0.4238 MW, and 3.985%, respectively.For wind farm B, compared with RBF model with a large error, the MAE, RMSE, and MAPE of BiLSTM were reduced by 0.6417 MW, 1.1023 MW, and 22.430%, respectively; when compared with LightGBM, the MAE, RMSE, and MAPE of BiLSTM were reduced by 0.2361 MW, 0.7805 MW, and 10.436%, respectively; and when compared with LSTM with better forecasting results, the MAE, RMSE, and MAPE of BiLSTM were reduced by 0.1738 MW, 0.0075 MW, and 4.789%, respectively.From the above analysis, we know that the MAE, RMSE, and MAPE of BiLSTM are all the lowest, which shows that the prediction accuracy of the BiLSTM was better than that of the five other basic models.From Figure 13, we can also observe that the prediction curve obtained by BiLSTM was most like the actual wind power curve with a high degree of overlap, which verifies the superiority of the BiLSTM model.Although BiLSTM performed best in these six basic models, the overall prediction accuracy still cannot meet the high requirements of practical application.Therefore, it is necessary to improve the prediction performance through the following two rounds of method improvement.This is because, although the basic model has universality and is widely used in load forecasting, wind power forecasting, photovoltaic forecasting, and other forecasting fields, compared with load and photovoltaic power, which have strong daily variation regularity, wind power has extremely strong uncertainty.The basic model has a poor ability to capture this uncertainty and has a certain delay effect.Especially at points with high wind power fluctuations, such as the peak and valley of the curve, the basic model is likely to overlook the changes of these mutation points, which makes the prediction performance poor.Extracting the characteristics of wind through feature engineering and constructing a more suitable prediction model for wind power can improve this situation.Therefore, we combined feature engineering with the basic model in the second round of forecasting.In addition, the single basic prediction model has poor generalization ability for different data and cannot fully include various factors, resulting in its inability to achieve high-precision prediction in certain specific application scenarios.Wind power is influenced by multiple factors, so the single basic model is difficult to fully learn, which will lead to unsatisfactory prediction results.Especially in the case of extreme weather or large changes in meteorological factors, the prediction will show significant deviation.The peak or valley of the wind power curve mostly corresponds to the occurrence of large changes of wind speed, so the single basic model has poor performance on the peak or valley of the power curve.The hybrid prediction method considers the respective advantages of different models simultaneously, which can effectively overcome the inherent limitations of the single prediction model and significantly improve the accuracy of prediction.Therefore, we added the LightGBM correction model to form a hybrid prediction model in the third round of forecasting.

Forecasting after Similar Data Component Extraction
In order to verify the effectiveness of the feature analysis method used in this paper, the method in this paper is compared with many other methods to verify the prediction effect of using different similarity data extraction methods for forecasting based on the same basic model.This round included six forecasting models.After wavelet decomposition of the original data, BiLSTM model power forecasting was performed for each group, which is denoted as Wavelet-BiLSTM; after EMD decomposition of the original data, BiLSTM model power forecasting was performed for each group, which is denoted as EMD-BiLSTM.Only considering wind speed division, data were divided into three groups: zero-power group, standard-power group, and rated-power group, and each group was forecasted by BiLSTM, denoted as BiLSTM (speed).Only considering wind direction clustering, data were divided into four groups: north group, east group, south group, and west group, and each group was forecasted by BiLSTM, denoted as BiLSTM (direction).Considering both wind speed division and wind direction clustering, the 6 groups of data after classification were, respectively, forecasted by LSTM and BiLSTM, denoted as LSTM (wind), and BiLSTM (wind).The wavelet decomposition and EMD decomposition results are in Figures 14 and 15.In Figure 15 Table 8 shows the forecasting results of the above models.Figure 16 shows the forecasting results of the two wind farms.Due to the use of spline interpolation for classification and screening in EMD, the shortcomings of separation and intermittency restricted the effectiveness of similar frequency components separation.Moreover, it was susceptible to noise interference, resulting in mode mixing and endpoint effects [51].A large amount of wind power data inevitably contains noise, so EMD-BiLSTM performed the worst.At the same time, due to the excessive EMD components, the prediction time and calculation cost of the prediction model were greatly increased.We observed that BiLSTM(wind) had the best prediction performance among the six hybrid models.For Table 8 shows the forecasting results of the above models.Figure 16 shows the forecasting results of the two wind farms.Due to the use of spline interpolation for classification and screening in EMD, the shortcomings of separation and intermittency restricted the effectiveness of similar frequency components separation.Moreover, it was susceptible to noise interference, resulting in mode mixing and endpoint effects [51].A large amount of wind power data inevitably contains noise, so EMD-BiLSTM performed the worst.At the same time, due to the excessive EMD components, the prediction time and calculation cost of the prediction model were greatly increased.We observed that BiLSTM (wind) had the best prediction performance among the six hybrid models.For wind farm A, compared with the BiLSTM model, the MAE, RMSE, and MAPE of BiLSTM (wind) were reduced by 0.5544 MW, 1.6256 MW, and 22.543%, respectively; when compared with BiLSTM (speed), the MAE, RMSE, and MAPE of BiLSTM (wind) were reduced by 0.4008 MW, 1.1404 MW, and 11.2261%, respectively; and when compared with BiLSTM (direction), the MAE, RMSE, and MAPE of BiLSTM (wind) were reduced by 0.0882 MW, 0.2903 MW, and 4.7308%, respectively.For wind farm B, compared with the BiLSTM model, the MAE, RMSE, and MAPE of BiLSTM (wind) were reduced by 0.7749 MW, 1.0830 MW, and 13.642%, respectively; when compared with BiLSTM (speed), the MAE, RMSE, and MAPE of BiLSTM (wind) were reduced by 0.4562 MW, 0.6644 MW, and 12.4738%, respectively; and when compared with BiLSTM (direction), the MAE, RMSE, and MAPE of BiLSTM (wind) were reduced by 0.0294 MW, 0.4316 MW, and 4.5341%, respectively.From the above analysis, we know that the MAE, RMSE, and MAPE of BiLSTM (wind) are lower than that of all forecasting methods in the previous round, and the accuracy of the methods proposed in this round has been greatly improved.The prediction curve had a high fitting degree with the actual power curve.The prediction effect of BiLSTM (wind) was better than that of Wavelet-BiLSTM and EMD-BiLSTM, which verifies the effectiveness of the similarity data extraction method proposed in this paper.At the same time, it was also better than that of BiLSTM (speed) and BiLSTM (direction), which were better than BiLSTM, indicating that both wind speed division and wind direction clustering play a positive role in improving the prediction accuracy, and the hybrid forecasting model of both of them strengthens this positive effect.

Error Correction of Forecasting Results
In order to eliminate the time dependence, this round verified the model effect by changing the time scale and carried out experiments with time scales of 24 h, 12 h, and 8 h, respectively.At the same time, in this round, LightGBM was used to correct the prediction results of BiLSTM (wind), denoted as BiLSTM-LightGBM.The MIC correlation analysis between various features and preliminary prediction error is shown in Table 9.We observed that for wind farm A, under the three different time scales, the MIC value between the BiLSTM (wind) forecasting power and the prediction error was the highest, followed by wind speed, power prediction error from the previous 10 mins and the previous 20 mins, in turn.Other features were relatively weakly correlated with prediction error.The four features with the strongest correlation were selected as the input features for wind farm A, namely BiLSTM (wind) forecasting power, wind speed, power prediction error from the previous 10 mins and 20 mins.For wind farm B, under the three different time scales, the MIC value between the BiLSTM (wind) forecasting power and the prediction error was also the highest, followed by the power prediction error from the previous 10 mins, wind speed, and the power prediction error from the previous 20 mins, in turn.Other features are relatively weakly correlated with prediction error.The four features with the strongest correlation were selected as the input features for wind farm B, which were the same as wind farm A. The error value was forecasted by LightGBM, so all preliminary power prediction results could be corrected 10 min in advance.Table 10 shows the results at different time scales and the modified results of the BiLSTM-LightGBM model, and Figure 17 more intuitively shows the correction effect when the time scale was 24 h.As the time scale decreased, the BiLSTM (wind) model accuracy gradually increased, and the accuracy after correction was improved at each time scale.When the time scale was 24 h, for wind farm A, compared with before modification, the MAE, RMSE, and MAPE after modification were reduced by 0.0411 MW, 0.0561 MW, and 0.2448%, respectively; for wind farm B, compared with before modification, the MAE, RMSE, and MAPE after modification were reduced by 0.1767 MW, 0.1241 MW, and 0.3239%, respectively.The corrected power curves had a better ability to track the power change, which verified the effectiveness of the error correction, and thus further validated the effectiveness of the proposed method in this paper.

Conclusions
In this paper, a wind power forecasting model based on feature analysis and error correction has been verified by experiments, and the following conclusions can be drawn: 1.According to the correlation of MIC analysis, the wind speed and wind direction provided by NWP play a leading role in the accuracy of the forecasting results.In addition to the above two factors, the correlation of temperature is also relatively large, and the other meteorological factors have little correlation with wind power so they are not considered in the forecasting.Therefore, MIC is used to achieve data dimension reduction by eliminating the features with weak correlation with wind power.

Conclusions
In this paper, a wind power forecasting model based on feature analysis and error correction has been verified by experiments, and the following conclusions can be drawn: 1.
According to the correlation of MIC analysis, the wind speed and wind direction provided by NWP play a leading role in the accuracy of the forecasting results.
In addition to the above two factors, the correlation of temperature is also relatively large, and the other meteorological factors have little correlation with wind power so they are not considered in the forecasting.Therefore, MIC is used to achieve data dimension reduction by eliminating the features with weak correlation with wind power.

2.
As can be seen from the first round of forecasting, for wind farm A, compared with the RBF model with a large error, the MAE, RMSE, and MAPE of BiLSTM are reduced by 0.9141 MW, 1.1711 MW, and 12.180%, respectively; for wind farm B, compared with the RBF model with a large error, the MAE, RMSE, and MAPE of BiLSTM are reduced by 0.6417 MW, 1.1023 MW, and 22.430%, respectively.By comparing with five other basic single forecasting models, the experiments verify that BiLSTM has a higher prediction accuracy due to its full consideration of time series information, which greatly avoids the prediction errors generated by the basic models.

3.
As can be seen from the second round of forecasting, compared with the BiLSTM method, which does not process data but directly predicts, for wind farm A, the MAE, RMSE, and MAPE of BiLSTM (wind) are reduced by 0.5544 MW, 1.6256 MW, and 22.543%, respectively; for wind farm B, the MAE, RMSE, and MAPE of BiLSTM (wind) are reduced by 0.7749 MW, 1.0830 MW, and 13.642%, respectively.Feature engineering analysis is performed on the data from the perspectives of wind speed and wind direction.Division and clustering methods are used to mine the internal relationship of the data and effectively screen out the data with similar features to achieve component forecasting.Compared with the method of direct prediction without processing data, and the forecasting method based on wavelet decomposition and EMD, the prediction accuracy of this method has been greatly improved.

4.
As can be seen from the third round of forecasting, when the time scale is 24 h, compared with before correction, for wind farm A, the MAE, RMSE, and MAPE of BiLSTM-LightGBM are reduced by 0.0411 MW, 0.0561 MW, and 0.2448%, respectively; for wind farm B, the MAE, RMSE, and MAPE of BiLSTM-LightGBM are reduced by 0.1767 MW, 0.1241 MW, and 0.3239%, respectively.As the forecasting time scale decreases, the forecasting accuracy gradually increases.LightGBM, as an excellent forecasting model parallel with deep learning, is used for error correction in this paper, which can effectively capture the information omitted by BiLSTM.The two algorithms complement each other, thus making the forecasting accuracy higher.
The feasibility of the proposed method is verified by practical examples.In subsequent research, dynamic wind speed division and wind direction clustering will be carried out on the data, while further accuracy improvement from the perspective of model parameter optimization will be considered.In addition, spatial and temporal downscaling of NWP data from multiple stations also can be considered in order to eliminate the NWP prediction errors caused by topographic location differences and to provide more timely prediction results.Moreover, the probability forecasting can be developed based on single point forecasting in order to realize the quantitative description of wind energy uncertainty, better serving the multi-aspect optimization decisions of power system.

Figure 1 .
Figure 1.Relationship between wind speed and wind turbine output.

Figure 1 .
Figure 1.Relationship between wind speed and wind turbine output.

Figure 2 .
Figure 2. The cumulative distribution function of wind speed and the modification of partition nodes.

Figure 2 .
Figure 2. The cumulative distribution function of wind speed and the modification of partition nodes.
f x , ω i x , ω c x , and ω o x are the weight matrix of x t with forgetting gate, input gate, primitive input, and output gate, respectively; ω f h , ω i h , ω c h , and ω o h are the weight matrix of h t−1 with forgetting gate, input gate, primitive input, and output gate, respectively; b f , b i , b c , and b o are the bias vectors of the corresponding parts.Energies 2023, 16, x FOR PEER REVIEW 10 of 26 x f ω , x i ω , x c ω , and x o ω are the weight matrix of t x with forgetting gate, input gate, primitive input, and output gate, respectively; h f ω , h i ω , h c ω , and h o ω are the weight matrix of −1 t h with forgetting gate, input gate, primitive input, and output gate, respectively; f b , i b , c b , and o b are the bias vectors of the corresponding parts.

Figure 6 .
Figure 6.Wind power forecasting when the time scale is T'.
wind farm A; scale parameter = 2.165 c , shape parameter = 9.156 k in wind farm B. According to the actual situation of the wind farm,

Figure 6 .
Figure 6.Wind power forecasting when the time scale is T .
Figure7shows wind speed frequency histograms of two wind farms.The wind speed in wind farm A is bimodal distribution, so the mixed Weibull fitting composed of two Weibull distributions was used; the wind speed in wind farm B is unimodal distribution, so the single Weibull fitting was used.It can be intuitively seen that the wind speed distribution of wind farm A is relatively scattered, and the wind speed frequency of 3-8 m/s is similar; the wind speed distribution of wind farm B is relatively concentrated, and the wind speed frequency of 6-8 m/s is relatively high.Figure8shows the Weibull fitting results, wherec 1 = 2.946, c 2 = 5.721, k 1 = 2.295, k 2 = 2.632, ω 1 = 0.208, and ω 2 = 0.792 in wind farm A; scale parameter c = 2.165, shape parameter k = 9.156 in wind farm B. According to the actual situation of the wind farm, v in = 3 m/s, v r = 13 m/s and v out = 25 m/s.Since none of the selected historical data exceeds the cut-out wind speed, only the cut-in wind speed and the rated wind speed were considered when dividing the wind speed.Figure 9 shows the fitted wind speed cumulative distribution function.According to the rules in Figure 2, v up in,A = 2.808 m/s, v up r,A = 12.149 m/s, v down in,A = 3.192 m/s, and v down r,A = 14.740 m/s in, wind farm A can be obtained; v up in,B = 2.639 m/s, v up r,B = 12.551 m/s, v down in,B = 3.323 m/s, and v down r,B = 13.510m/s in wind farm B.In wind farm A, when wind speed fluctuation shows an upward trend, 0 ≤ v < 2.808 m/s is the zero-power group, 2.808 m/s ≤ v < 12.149 m/s is the standard-power group, and v ≥ 12.149 m/s is the rated-power group; when wind speed fluctuation shows a downward trend, 0 ≤ v < 3.192 m/s is the zero-power group, 3.192 m/s ≤ v < 14.740 m/s is the standard-power group, and v ≥ 14.740 m/s is the rated-power group.In wind farm B, when wind speed fluctuation shows an upward trend, 0 ≤ v < 2.639 m/s is the zero-power group, 2.639 m/s ≤ v < 12.551 m/s is the standard-power group, and v ≥ 12.551 m/s is the rated-power group; when wind speed fluctuation shows a downward trend, 0 ≤ v < 3.323 m/s is the zero-power group, 3.323 m/s ≤ v < 13.510 m/s is the standard-power group, and v ≥ 13.510 m/s is the rated-power group.

Figure 9 .
Figure 9. Cumulative distribution function of wind speed.(a) Wind farm A. (b) Wind farm B.

Figure 10 .
Figure 10.Elbow's method determines the optimal K-value.

Figure 10 .
Figure 10.Elbow's method determines the optimal K-value.

Figure 11 .
Figure 11.Relationship between wind direction sine, wind direction cosine, and power before clustering.(a) Wind farm A. (b) Wind farm B.

Figure 11 .Figure 12 .
Figure 11.Relationship between wind direction sine, wind direction cosine, and power before clustering.(a) Wind farm A. (b) Wind farm B. Energies 2023, 16, x FOR PEER REVIEW 17 of 26

Figure 12 .
Figure 12.Relationship between wind direction sine, wind direction cosine, and power after clustering.(a) Wind farm A. (b) Wind farm B.

Figure 15 .
Figure 15.The EMD decomposition results.(a) Wind farm A. (b) Wind farm B.

Figure 16 .Figure 16 .
Figure 16.Model forecasting results after similar data component extraction.(a) Wind farm A. (b) Wind farm B.4.4.3.Error Correction of Forecasting ResultsIn order to eliminate the time dependence, this round verified the model effect by changing the time scale and carried out experiments with time scales of 24 h, 12 h, and 8 h, respectively.At the same time, in this round, LightGBM was used to correct the prediction results of BiLSTM(wind), denoted as BiLSTM-LightGBM.The MIC correlation

Figure 17 .
Figure 17.Comparison of results before and after modification when the time scale is 24 h.(a) Wind farm A. (b) Wind farm B.

2 .Figure 17 .
Figure 17.Comparison of results before and after modification when the time scale is 24 h.(a) Wind farm A. (b) Wind farm B.

Table 1 .
Comparison of wind power forecasting models.

Table 2 .
Comparison of multiple correlation analysis methods.

Table 3 .
MIC value between features and wind power.

Table 4 .
The initial clustering centers selected after FCM clustering.

Table 4 .
The initial clustering centers selected after FCM clustering.

Table 4 .
The initial clustering centers selected after FCM clustering.

Table 5 .
Model parameter setting of LSTM.

Table 6 .
Basic single model forecasting results.

Table 7 .
Prediction results of a single BiLSTM when changing parameters of LSTM.

Table 7 .
Prediction results of a single BiLSTM when changing parameters of LSTM.

Table 8 .
Model forecasting results after similar data component extraction.

Table 9 .
MIC value between features with the prediction error.

Table 10 .
Forecast results at different time scales and the modified results of BiLSTM-LightGBM model.