Power Optimization for Wind Turbines Based on Stacking Model and Pitch Angle Adjustment

: As we know, power optimization for wind turbines has great signiﬁcance in the area of wind power generation, which means to make use of wind resources more e ﬃ ciently. Especially nowadays, wind power generation has become more and more important. Generally speaking, many parameters could be optimized to enhance power output, including blade pitch angle, which is usually ignored. In this article, a stacking model composed of Random Forest (RF), Gradient Boosting Decision Tree (GBDT), Extreme Gradient Boosting (XGBOOST) and Light Gradient Boosting Machine (LGBM) is trained based on historical data exported from the Supervisory Control and Data Acquisition (SCADA) system for output power prediction. Then, we carry out power optimization through pitch angle adjustment based on the obtained prediction model. Our research results indicate that power output could be enhanced by adjusting pitch angle appropriately.


Introduction
As a clean and renewable energy, wind energy has been widely used by humankind in many areas, including electricity generation [1,2]. According to global wind report 2019 by the Global Wind Energy Council (GWEC) [3], 60.4 GW of wind energy capacity was installed in 2019, a 19 percent increase from installations in 2018 and the total capacity for wind energy globally is now over 651 GW, an increase of 10 percent compared to 2018. These statistical results imply that wind power generation is becoming more and more important.
To make full use of wind resources, power optimization for wind turbines is necessary. This has become a hot topic recently. To date, many researchers have contributed a lot to this area. For example, in [4], large-eddy simulations with extremum-seeking control for individual wind turbine were performed for power optimization. Their study was focused on how rated wind speeds with turbines are controlled by generator torque gain. In [5], the authors proposed a control framework to maximize wind turbine power generation through dynamic optimization of drive train gear ratio. Furthermore, a kind of lazy, greedy algorithm for power optimization of wind turbine positioning on complex terrain was put forward in [6]. Other papers, like [7][8][9][10][11][12], also presented many optimization strategies to enhance power output of wind turbines.
However, we notice that these methods have not involved any Machine Learning (ML) technology to date. In fact, ML has been applied in many areas like Image Recognition, Natural Language Processing (NLP) and Data Mining [13]. It could also be used for our topic. One major thing we can do with ML technology is to obtain a power prediction model. This is important to do before we can carry out optimization. In fact, many papers related to power prediction for wind turbines have been published, like [14], where researchers created a turbine regression tree model to predict power output. In [15], authors used deep neural networks with discrete target classes to do probabilistic short-term wind power forecasts. Besides, BP and RBF neural networks were applied to realize wind power short-term prediction in [16]. These contributions are of great significance. Motivated by their approaches, in this article, we chose a stacking method for prediction modeling. The stacking method is one kind of Ensemble Learning. The main idea of stacking is to obtain a strong learner by combining many weak learners together. We will introduce this in detail later.
Once we get the prediction model, we get the objective function for optimization problem so that the power optimization process could be started. In our work, blade pitch angle is the main parameter to be optimized. A completed algorithm flow showing how to optimize pitch angle will be presented in this article.
The rest of this paper is organized as follows: In Section 2, a brief introduction to three typical types of abnormal data in original datasets is given to indicate the necessity of data preprocessing. Section 3 shows the entire structure of our optimization strategy. In Section 4, we present the power prediction modeling process. The optimization process is shown in Section 5. We conclude our work in Section 6.

Background of Abnormal Data
As we know, a wind turbine system is complicated, composed of blades, drive system, yaw system, hydraulic system, braking system, control system, etc. Therefore, power output could be affected by many factors like wind speed, yawing error, temperature inside and outside cabin, pitch angle, etc. Among these factors, wind speed is the main factor. We used to use the wind power curve as a reference to describe general power output under different wind speeds. Figure 1 shows a wind power scatter diagram based on some original data. Three common types of abnormal data are shown in this figure as well.
Energies 2020, 13, x FOR PEER REVIEW 2 of 16 probabilistic short-term wind power forecasts. Besides, BP and RBF neural networks were applied to realize wind power short-term prediction in [16]. These contributions are of great significance. Motivated by their approaches, in this article, we chose a stacking method for prediction modeling.
The stacking method is one kind of Ensemble Learning. The main idea of stacking is to obtain a strong learner by combining many weak learners together. We will introduce this in detail later.
Once we get the prediction model, we get the objective function for optimization problem so that the power optimization process could be started. In our work, blade pitch angle is the main parameter to be optimized. A completed algorithm flow showing how to optimize pitch angle will be presented in this article.
The rest of this paper is organized as follows: In Section 2, a brief introduction to three typical types of abnormal data in original datasets is given to indicate the necessity of data preprocessing. Section 3 shows the entire structure of our optimization strategy. In Section 4, we present the power prediction modeling process. The optimization process is shown in Section 5. We conclude our work in Section 6.

Background of Abnormal Data
As we know, a wind turbine system is complicated, composed of blades, drive system, yaw system, hydraulic system, braking system, control system, etc. Therefore, power output could be affected by many factors like wind speed, yawing error, temperature inside and outside cabin, pitch angle, etc. Among these factors, wind speed is the main factor. We used to use the wind power curve as a reference to describe general power output under different wind speeds. Figure 1 shows a wind power scatter diagram based on some original data. Three common types of abnormal data are shown in this figure as well. Obviously, we can recognize a general wind power curve from this scatter diagram. This is useful for us to identify those abnormal data:

•
The lower stacked abnormal data which are typically recognized no matter the wind speed, with a power output of zero, are as shown in Type 1. Some reasons for such abnormal data include damage to power measuring instrument, wind speed measuring instrument, abnormal communication equipment or failure of wind turbine, etc; • The wind curtailment data are a horizontally dense data cluster, as shown in Type 2. Such abnormal data show the output power of wind turbine does not change as wind speed changes. One main reason is the forced wind abandonment of wind farm; Obviously, we can recognize a general wind power curve from this scatter diagram. This is useful for us to identify those abnormal data:

•
The lower stacked abnormal data which are typically recognized no matter the wind speed, with a power output of zero, are as shown in Type 1. Some reasons for such abnormal data include damage to power measuring instrument, wind speed measuring instrument, abnormal communication equipment or failure of wind turbine, etc.; Energies 2020, 13, 4158 3 of 15

•
The wind curtailment data are a horizontally dense data cluster, as shown in Type 2. Such abnormal data show the output power of wind turbine does not change as wind speed changes. One main reason is the forced wind abandonment of wind farm; • The decentralized abnormal data are randomly distributed around wind power curve, as shown in Type 3. Such a distribution of these abnormal data is irregular, and usually caused by a decline in sensor accuracy, instrument failure or signal propagation noise, etc.
These abnormal data should be eliminated since they could not represent a normal working situation of wind turbine. A model trained under these data will face a great prediction error. Due to these reasons, a data preprocessing process is very necessary.

Framework of Power Optimization Approach
Note that in our work, output power is only target value to be optimized. To solve this optimization problem, we need the objective function first. Our objective function should be able to describe mapping between features and output power. In other words, it is a power prediction model which predicts power output based on the features given. Therefore, our approach would be divided into two parts. In the first part, we need to build power prediction model based on historical data. In the second part, we use this model for optimization. To make our method reasonable, prediction modeling and optimization will be performed on each wind turbine individually. Figure 2 shows a brief overview of the entire framework.

•
The decentralized abnormal data are randomly distributed around wind power curve, as shown in Type 3. Such a distribution of these abnormal data is irregular, and usually caused by a decline in sensor accuracy, instrument failure or signal propagation noise, etc. These abnormal data should be eliminated since they could not represent a normal working situation of wind turbine. A model trained under these data will face a great prediction error. Due to these reasons, a data preprocessing process is very necessary.

Framework of Power Optimization Approach
Note that in our work, output power is only target value to be optimized. To solve this optimization problem, we need the objective function first. Our objective function should be able to describe mapping between features and output power. In other words, it is a power prediction model which predicts power output based on the features given. Therefore, our approach would be divided into two parts. In the first part, we need to build power prediction model based on historical data. In the second part, we use this model for optimization. To make our method reasonable, prediction modeling and optimization will be performed on each wind turbine individually. Figure 2 shows a brief overview of the entire framework.

Data Preprocessing
As discussed in Section 2, a data preprocessing process which eliminates three typical types of abnormal data is needed first to ensure that our model can learn better. We notice that historical data contain a "state" field which represents the current working state of wind turbine. This is useful information. We first eliminate those data with abnormal "state" field. Those data with power output equal or below 0 are also removed, because it is meaningless to perform prediction or optimization of these data.
For the rest of the abnormal data, we use the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) method to remove them. DBSCAN is a typical clustering algorithm based on density of data. Different from partition method and hierarchical clustering, it defines clusters as the largest set of points connected by density which can divide regions with high enough density into clusters and find clusters of any shape in noise space [17].
It is practicable and reasonable to apply a DBSCAN clustering algorithm. As shown in Figure 1, normal data generally gather around, which forms a region with shape like a wind power curve. This region has high density. On the contrary, abnormal data are randomly distributed around the wind power curve, which forms some regions with low density. Therefore, the DBSCAN algorithm could be able to eliminate these abnormal data.

Data Preprocessing
As discussed in Section 2, a data preprocessing process which eliminates three typical types of abnormal data is needed first to ensure that our model can learn better. We notice that historical data contain a "state" field which represents the current working state of wind turbine. This is useful information. We first eliminate those data with abnormal "state" field. Those data with power output equal or below 0 are also removed, because it is meaningless to perform prediction or optimization of these data.
For the rest of the abnormal data, we use the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) method to remove them. DBSCAN is a typical clustering algorithm based on density of data. Different from partition method and hierarchical clustering, it defines clusters as the largest set of points connected by density which can divide regions with high enough density into clusters and find clusters of any shape in noise space [17].
It is practicable and reasonable to apply a DBSCAN clustering algorithm. As shown in Figure 1, normal data generally gather around, which forms a region with shape like a wind power curve. This region has high density. On the contrary, abnormal data are randomly distributed around the Energies 2020, 13, 4158 4 of 15 wind power curve, which forms some regions with low density. Therefore, the DBSCAN algorithm could be able to eliminate these abnormal data.

Feature Selection
The main purpose of feature selection is to reduce dimensions of feature vectors to avoid over-fitting, improve model generalization ability and make the model learn faster. There are three main types of feature selection algorithms: Filter, Wrapper and Embedding. The filter method scores each feature according to divergence or correlation, sets a threshold, then selects them. The wrapper method circularly chooses several features or excludes them referring to the objective function until one best subset is selected. The embedding method uses original data to train a learning model and obtain the weight coefficients of each feature, then selects features from large to small according to coefficients [18]. In our research, all of them will be tried once. We make a comprehensive consideration after that to obtain the final feature selection results.

Stacking Model
In the Machine Learning area, Ensemble Learning has been popular in recent years. To date, there have been three main forms of ensemble learning method: Bagging, Boosting and Stacking. Bagging was first proposed by Leo Bierman in 1996. One classical bagging model is known as RF. Boosting was informed by Michael Kearns. There are many popular boosting models like GBDT, XGBOOST or LGBT. Stacking was recently proposed as one popular ensemble learning method which has been proved to be efficient and powerful in Kaggle competition, sentiment classification and many other areas. In recent years, many researchers chose the stacking method to finish their prediction work, like [19][20][21][22]. Bagging is a good way to reduce variance in the training process, since it uses repeated sampling to ensure that each subnet of bagging can cover training sample space well. On the other hand, Boosting mainly reduces the bias of the training process through iterative learning. As for the Stacking method, due to its specific structure which combines different learning models together reasonably, it can not only reduce variance in the training process, but also bias. Therefore, in this article, we select the stacking method to construct a learning model. Figure 3 shows a two-layer stacking model constructed by us.

Feature Selection
The main purpose of feature selection is to reduce dimensions of feature vectors to avoid overfitting, improve model generalization ability and make the model learn faster. There are three main types of feature selection algorithms: Filter, Wrapper and Embedding. The filter method scores each feature according to divergence or correlation, sets a threshold, then selects them. The wrapper method circularly chooses several features or excludes them referring to the objective function until one best subset is selected. The embedding method uses original data to train a learning model and obtain the weight coefficients of each feature, then selects features from large to small according to coefficients [18]. In our research, all of them will be tried once. We make a comprehensive consideration after that to obtain the final feature selection results.

Stacking Model
In the Machine Learning area, Ensemble Learning has been popular in recent years. To date, there have been three main forms of ensemble learning method: Bagging, Boosting and Stacking. Bagging was first proposed by Leo Bierman in 1996. One classical bagging model is known as RF. Boosting was informed by Michael Kearns. There are many popular boosting models like GBDT, XGBOOST or LGBT. Stacking was recently proposed as one popular ensemble learning method which has been proved to be efficient and powerful in Kaggle competition, sentiment classification and many other areas. In recent years, many researchers chose the stacking method to finish their prediction work, like [19][20][21][22]. Bagging is a good way to reduce variance in the training process, since it uses repeated sampling to ensure that each subnet of bagging can cover training sample space well. On the other hand, Boosting mainly reduces the bias of the training process through iterative learning. As for the Stacking method, due to its specific structure which combines different learning models together reasonably, it can not only reduce variance in the training process, but also bias. Therefore, in this article, we select the stacking method to construct a learning model. Figure 3 shows a two-layer stacking model constructed by us. This stacking model is constructed of two layers. The first layer combines four weak learners: RF, XGBOOST, LGBM and GBDT. Each learner needs to go through a five-fold-cross-training process individually to generate one new training datum and one new test datum. We combine these data into column vectors in the second layer, respectively, as new training data and test data, which will be used to train and test another weak learner, XGBOOST. Figure 4 shows the five-fold-cross-training process in detail. This stacking model is constructed of two layers. The first layer combines four weak learners: RF, XGBOOST, LGBM and GBDT. Each learner needs to go through a five-fold-cross-training process individually to generate one new training datum and one new test datum. We combine these data into column vectors in the second layer, respectively, as new training data and test data, which will be used to train and test another weak learner, XGBOOST. Figure 4 shows the five-fold-cross-training process in detail.  The five-fold cross-training process is an effective way to avoid over-fitting. In this process, training data will be divided into five folds. Each time, we use four folds of them to train a model and use this model to generate two predictions based on rest fold and test data. We repeat this process five time to obtain five predictions from five folds, which make up one training datum. We obtain five predictions from the same test data as well, taking their average to obtain one new test datum. We use these data to train and test the meta model in the second layer. Note that each basic model in the first layer should be retrained once under the whole training data as a final basic model.
Stacking provides a reasonable way to mix different learning models together. Given the new features, each basic model in the first layer gives out its predictions individually first. The meta model combines these predictions, comprehensively considers them, and gives out final predictions. Through this way, prediction error and variance could decrease, and the model could obtain better generalization ability. We will prove these advantages by research results later.

Problem Formulation
Suppose our prediction model is , where represents all features. Then, we can give out a basic optimization problem, which is formulated as , , ∈ where represents features to be optimized, represents rest features. Clearly, we have ∪ = .
is set composed of all possible values of . Problem (1) represents a seek for optimal to get maximum , .
If such a kind of optimization is performed on dataset , then (1) should be rewritten as , , ∈ where ∑ , represents a power accumulation on dataset . Problem (2) represents a seek for optimal to get maximum ∑ , in dataset .

Pitch Angle Control Strategy
Before introducing a pitch angle control strategy, we want to introduce a general control strategy first. Suppose the regulation period is , is a parameter to be optimized. Then, our general control strategy could be described as follows. At the beginning of each period, , we solve the optimization problem (2) on a historical dataset of last period to get optimal the parameter for the current period , then, we set = . Note that there should not be The five-fold cross-training process is an effective way to avoid over-fitting. In this process, training data will be divided into five folds. Each time, we use four folds of them to train a model and use this model to generate two predictions based on rest fold and test data. We repeat this process five time to obtain five predictions from five folds, which make up one training datum. We obtain five predictions from the same test data as well, taking their average to obtain one new test datum. We use these data to train and test the meta model in the second layer. Note that each basic model in the first layer should be retrained once under the whole training data as a final basic model.
Stacking provides a reasonable way to mix different learning models together. Given the new features, each basic model in the first layer gives out its predictions individually first. The meta model combines these predictions, comprehensively considers them, and gives out final predictions. Through this way, prediction error and variance could decrease, and the model could obtain better generalization ability. We will prove these advantages by research results later.

Problem Formulation
Suppose our prediction model is f (L), where L represents all features. Then, we can give out a basic optimization problem, which is formulated as where x represents features to be optimized, L r represents rest features. Clearly, we have L r ∪ x = L. Φ is set composed of all possible values of x. Problem (1) represents a seek for optimal x to get maximum f (L r , x).
If such a kind of optimization is performed on dataset D, then (1) should be rewritten as where D f (L r , x) represents a power accumulation on dataset D. Problem (2) represents a seek for optimal x to get maximum D f (L r , x) in dataset D.

Pitch Angle Control Strategy
Before introducing a pitch angle control strategy, we want to introduce a general control strategy first. Suppose the regulation period is T, x is a parameter to be optimized. Then, our general control strategy could be described as follows. At the beginning of each period, T cur , we solve the optimization problem (2) on a historical dataset D last of last period T last to get optimal the parameter x best for the current period T cur , then, we set x cur = x best . Note that there should not be abnormal data in D last . In short, our strategy is a track for maximum power output. Below is the completed form of our algorithm (see Algorithm 1).

Algorithm 1: Power Optimization (Maximum Power Output Tracking)
Input: A trained model f (L), historical dataset D last in last period T last , a set composed of all possible values of feature x to be optimized C = {x 1 , x 2 , . . . , x n }.

1
initialize: set maximum power accumulation as S max = D last f (L r , x 1 ), set optimal x best as 2 end for 8 set x cur = x best for current period T cur .
The regulation period T is an important parameter which decides whether we can tack maximum power output in real time or not. Generally speaking, a long period T will decrease the effectiveness of tacking and the optimization result may get worse. On the other hand, if T is small enough, the optimal parameter x best in the last period T last will be optimal in the next period. In this way, we could get the best power optimization result.
If parameter x is pitch angle, then we get the pitch angle control strategy according to general strategy. This strategy could be described by a flowchart, as shown in Figure 5, where p represents pitch angle. The regulation period is an important parameter which decides whether we can tack maximum power output in real time or not. Generally speaking, a long period will decrease the effectiveness of tacking and the optimization result may get worse. On the other hand, if is small enough, the optimal parameter in the last period will be optimal in the next period. In this way, we could get the best power optimization result.
If parameter is pitch angle, then we get the pitch angle control strategy according to general strategy. This strategy could be described by a flowchart, as shown in Figure 5, where represents pitch angle.

Data Description
Our data are provided by Wuzhong Baita Wind Power Corporation Limited. They cooperate with us and select two types of wind turbines in the wind farm as research objects. Each type contains four Energies 2020, 13, 4158 7 of 15 turbines, namely type A (#5, #6, #12, #13) and type B (#70, #71, #89, #91). Therefore, our entire data are from these eight wind turbines. Original data are at 1-min interval from 1 January 2019 to 15 May 2020. Some wind turbine parameters between type A and type B are different. Table 1 shows some of them. In our work, we chose two wind turbines from each type as research objects: #6, #12 from type A, #70, #89 from type B. For each turbine, we need one dataset for prediction modeling and one dataset for optimization. Table 2 describes these datasets in detail.

Data Preprocessing
Data preprocessing on the original dataset is needed before modeling. As mentioned, this process contains two steps. In step 1, we eliminate those data with abnormal "state" field or a power output no larger than 0. In step 2, we use DBSCAN to remove the rest of the abnormal data. Here, we take #6 as example and show the performance of DBSCAN in Figure 6.

Feature Selection
The original datasets contain 23 features and we only chose 12 of them. Three feature selection algorithms, Filter, Wrapper and Embedding, give out their selection, shown in Table 3. Note that in the Embedding method, we needed to choose one learning model to obtain the weight coefficient of As we can see from subgraph a, DBSCAN algorithm removes almost all abnormal data which distribute randomly around the wind power curve. The rest of the data will be used as final dataset for modeling.

Feature Selection
The original datasets contain 23 features and we only chose 12 of them. Three feature selection algorithms, Filter, Wrapper and Embedding, give out their selection, shown in Table 3. Note that in the Embedding method, we needed to choose one learning model to obtain the weight coefficient of each feature after training it. There were many models for us to choose like RF, LGBM and XGBOOST, but the training duration of each model is different. Considering that the XGBOOST model can be trained well within a short time, we selected it as the training model. Based on these selections, we need to make a comprehensive consideration. First, the Filter method chooses features based on their correlation with power out. Therefore, this selection has great reference value.

Model Construction
After data preprocessing on the original dataset, we made a division on it to obtain training data and test data. Specifically, we selected one datapoint from every 10 datapoints as a test point. These test points were combined as final test data. The rest of the datapoints were used as training data. We used training data to train models and evaluated them based on test data. Here, we chose three common evaluation indexes for evaluation: Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Coefficient of Determination (R 2 ) formulated as In our work, six different models were trained in total: Neural Networks (NN), GBDT, LGBM, XGBOOST, RF and STACKING. Table 4 shows their performances on the test data of wind turbines #6 and #70. Their performances on other wind turbines are similar so we omitted them. It is no doubt that the stacking model shows the best quality in MAE, RMSE and R 2 compared with the other models. Therefore, we chose the stacking model as the final prediction model.

Power Prediction
In this section, we performed a power prediction process based on test data to evaluate whether the stacking model could predict power output accurately or not. First, we used three indexes informed to describe performances of the stacking model on test data of four wind turbines in Table 5. Note that the coefficient of determination (R 2 ) of each stacking model is generally around 0.97, which represents a good fitting between actual power output and predicted power output. Therefore, these models could be used as power prediction model. Taking wind turbines #6 and 70 as examples, Figures 7 and 8 describe actual power, predicted power and absolute error intuitively. Note that the coefficient of determination (R 2 ) of each stacking model is generally around 0.97, which represents a good fitting between actual power output and predicted power output. Therefore, these models could be used as power prediction model. Taking wind turbines #6 and 70   As shown in Figures 7 and 8, the fitting between actual power and predicted power is good. A common phenomenon is that the larger the value tabof actual power, the larger the value of absolute error. However, relative error is generally around 10%.

Power Optimization
After we obtained prediction models, the optimization process could be started. Here, we chose feature "Blade 1 Pitch Angle Feedback A", a parameter which represents the pitch angle of blade 1 as a parameter to be optimized; we introduced the pitch angle control strategy clearly in Section 3.4.2. The original data of each turbine contain a series of possible pitch angle values; we checked these values and collected them into an optimization set . Table 6 describe these sets in detail. The regulation period is optional. Minimum period is 1 min since original data are at 1-min As shown in Figures 7 and 8, the fitting between actual power and predicted power is good. A common phenomenon is that the larger the value tabof actual power, the larger the value of absolute error. However, relative error is generally around 10%.

Power Optimization
After we obtained prediction models, the optimization process could be started. Here, we chose feature "Blade 1 Pitch Angle Feedback A", a parameter which represents the pitch angle of blade 1 as a parameter to be optimized; we introduced the pitch angle control strategy clearly in Section 3.4.2. The original data of each turbine contain a series of possible pitch angle values; we checked these values and collected them into an optimization set C. Table 6 describe these sets in detail. Table 6. Description of optimization parameter and optimization set.

Type Wind Turbine Optimization Parameter Optimization Set
The regulation period is optional. Minimum period is 1 min since original data are at 1-min intervals. As informed before, abnormal data should be avoided to participate in power optimization process. For the abnormal data in each period, the predicted power output under any pitch angle will be set as equal to actual power output.

Power Optimization for #6
In this section, we carry out power optimization for wind turbine #6 with the timestamp from 1 January 2019 to 31 December 2019. Figure 9 shows partial optimization results with regulation period T = 1 min.
Energies 2020, 13, x FOR PEER REVIEW 12 of 16 Figure 9. Partial power optimization results on #6 with T = 1 min.
Subgraph 1 shows optimized power accumulation and power accumulation under different pitch angles in each period. It proves the effectiveness of our pitch angle control strategy in maximum power output tracking. Since optimized power accumulation is at a maximum in most cases. We can find that power accumulation gets enhanced after the optimization process, as shown in Subgraph 2. Subgraph 3 shows the value of pitch angle in each period after optimization. Note that the regulation period could be changed, and a different period will influence the optimization results. Therefore, we tried different periods and give statistical results in Table 7.  Subgraph 1 shows optimized power accumulation and power accumulation under different pitch angles in each period. It proves the effectiveness of our pitch angle control strategy in maximum power output tracking. Since optimized power accumulation is at a maximum in most cases. We can find that power accumulation gets enhanced after the optimization process, as shown in Subgraph 2. Subgraph 3 shows the value of pitch angle in each period after optimization. Note that the regulation period could be changed, and a different period will influence the optimization results. Therefore, we tried different periods and give statistical results in Table 7. We can see that the increase percentage decreases as the period gets longer. It supports our opinion that a short regulation period generally brings better optimization results.

Power Optimization for #70
Like the optimization process for #6, first we set the regulation period T = 1 min. Figure 10 shows partial optimization results.

Power Optimization for #70
Like the optimization process for #6, first we set the regulation period T = 1 min. Figure 10 shows partial optimization results. Our pitch angle control strategy is proved to be effective in maximum power output tracking, as shown in Subgraph 1. Subgraph 2 represents power accumulation becoming enhanced after the optimization process compared with actual power accumulation. We use Table 8 to show optimization results under a different regulation period.  Our pitch angle control strategy is proved to be effective in maximum power output tracking, as shown in Subgraph 1. Subgraph 2 represents power accumulation becoming enhanced after the optimization process compared with actual power accumulation. We use Table 8 to show optimization results under a different regulation period.  Table 7, the increase percentage decreases as the period gets longer. Therefore, generally a short period would be better.

Power Optimization for Other Turbines
We also performed optimization for wind turbines #12, #89. Table 9 shows statistical results.

Conclusions
In this article, we proposed a new power optimization strategy to enhance the power output of wind turbines based on a stacking model and pitch angle adjustment. Our strategy includes two steps. In step one, we train a stacking model based on historical data to obtain the power prediction model. In step two, we carry out optimization based on prediction model and pitch angle adjustment. To prove the effectiveness of our strategy, we perform prediction modeling and the optimization process on four wind turbines, respectively.
Our strategy is proven to be reasonable and practical by the research results. First, we successfully obtained the power prediction model for each wind turbine by training a stacking model under original datasets. Three indexes MAE, RMSE and R 2 , are considered to evaluate the trained model and they represent a good quality of prediction model, as shown in Section 4. We also accomplished the optimization process successfully. Specifically, based on the prediction model, we performed the optimization process on four wind turbines by adjusting their pitch angles. From research results in Section 5, we can see that power output is enhanced, which proves that our pitch angle control strategy is effective. According to the optimization results of wind turbine #6, #12, #70 and #89, the general power increase percentage is around 1% when the regulation period is set as 1 min, and the increase percentage generally decreases as the period gets longer. Our improvement could be enhanced if the regulation period is smaller because a short regulation period generally brings better optimization results, as we have emphasized. In the end, even though our optimization process is performed on historical data of the year 2019, it could be applied into real application as well.
In fact, power optimization for wind turbines has been researched a lot in recent years. However, many traditional methods have not involved any Machine Learning (ML) technology. In this article, Machine Learning is first applied to build a prediction model which plays an important role in optimization. Pitch angle adjustment is seldom considered in traditional design. In other word, pitch angle generally remains unchanged in these approaches. In our work, pitch angle adjustment is first applied for optimization, and our research indicates that by adjusting pitch angle appropriately, power output could be enhanced. Note that if the pitch angle changes too frequently, mechanical stress increases and the lifetime of mechanical system may be reduced. Therefore, an appropriate regulation period is important in real application.