Modified Approach of Manufacturer’s Power Curve Based on Improved Bins and K-Means++ Clustering

The ideal wind turbine power curve provided by the manufacturer cannot monitor the practical performance of wind turbines accurately in the engineering stage; in this paper, a modified approach of the wind turbine power curve is proposed based on improved Bins and K-means++ clustering. By analyzing the wind speed-power data collected by the supervisory control and data acquisition system (SCADA), the relationship between wind speed and output is compared and elaborated on. On the basis of data preprocessing, an improved Bins method for equal frequency division of data is proposed, and the results are clustered through K-means++. Then, the wind turbine power curve correction is realized by data weighting and regression analysis. Finally, an example is given to show that the power curve of the same type of wind turbines, which, installed in different locations, are discrepant and different from the MPC, and the wind turbine power curve obtained by using this method can reflect the output characteristics of the wind turbine operating more effectively in a complex environment.


Introduction
In order to cope with the severe global energy crisis and prominent environmental problems, the development of renewable energy has attracted the attention of governments and organizations all over the world. Wind power, one of the most cost-effective forms of energy today, is emerging in the global energy market. According to the statistics of the Global Wind Energy Council (GWEC), in 2021, the newly installed capacity of wind power exceeds 93.6 GW, and the cumulative installed capacity reaches 837 GW [1]. With the increasing installed capacity, wind power companies pay more attention to the operation of wind turbines connected to the grid [2][3][4][5]. Among many factors reflecting the operating conditions, the power curve plays an important role and is an effective representation of the operating characteristics of wind turbines [6][7][8][9][10].
The power curve of wind turbine shows the input-output relationship between wind speed and power from a macro perspective directly. As the power curve of wind turbine is an important indicator of generator performance, it affects the power generation capacity of wind turbine directly. Therefore, although the wind turbine acceptance standards of each wind power operation enterprise are different, the power curve is the content that all operation enterprises should check for consistency. Wind power equipment manufacturers will provide standard wind turbine power curves, i.e., manufacturer's ideal power curve (MPC) as shown in Figure 1, that measured in "standard environment [11]" (15 C, 101.3 kPa) for each type of wind turbine when they leave the factory to reflect the output capacity of them. However, the practical engineering operating environment of wind turbines installed in wind farms is much more complex than the "standard environment", i.e., the output of wind turbines is affected by many factors, resulting in the fluctuation of output in a large range ( Figure 2) rather than strictly in accordance with the MPC [12][13][14]. Considering the open and complex operating environment, the wind turbine has a relatively high outage probability. The outages of wind turbines are taken into account in [11], the relationship between the outage probability and wind speed is analyzed, and a model for including the outage probability of wind turbines in simulating the power output of wind farms is proposed. Different from [13], the power probability distribution functions (PPDFs) of each wind turbine are analyzed without considering the outages. Assuming that the wind turbines obey Poisson distribution in the statistical space of the wind farm, the wind speed-power characteristics of the wind farm are analyzed from the perspective of probability. In order to study the relationship between wind turbine output and wind speed, an artificial neural network method is used in [15,16] to estimate the wind power output based on wind information of meteorological tower, and then the wind speed-power curve is obtained. However, the model has six important parameters, and the selection of these parameters has a great influence on the accuracy of the model, so the generalization ability of the model is weak. Considering the different models for monitoring wind farms, a nonlinear parametric model is proposed in [17], and uses the wind speed as input to monitor the wind farm performance. However, from another point of view, it is unreasonable to use only a single wind speed power curve to characterize the output characteristics of wind turbines. Starting with the uncertainties of wind power curves, a confidence band is presented under test conditions [17]. However, the practical engineering operating environment of wind turbines installed in wind farms is much more complex than the "standard environment", i.e., the output of wind turbines is affected by many factors, resulting in the fluctuation of output in a large range ( Figure 2) rather than strictly in accordance with the MPC [12][13][14]. Considering the open and complex operating environment, the wind turbine has a relatively high outage probability. The outages of wind turbines are taken into account in [11], the relationship between the outage probability and wind speed is analyzed, and a model for including the outage probability of wind turbines in simulating the power output of wind farms is proposed. Different from [13], the power probability distribution functions (PPDFs) of each wind turbine are analyzed without considering the outages. Assuming that the wind turbines obey Poisson distribution in the statistical space of the wind farm, the wind speedpower characteristics of the wind farm are analyzed from the perspective of probability. In order to study the relationship between wind turbine output and wind speed, an artificial neural network method is used in [15,16] to estimate the wind power output based on wind information of meteorological tower, and then the wind speed-power curve is obtained. However, the model has six important parameters, and the selection of these parameters has a great influence on the accuracy of the model, so the generalization ability of the model is weak. Considering the different models for monitoring wind farms, a nonlinear parametric model is proposed in [17], and uses the wind speed as input to monitor the wind farm performance. However, from another point of view, it is unreasonable to use only a single wind speed power curve to characterize the output characteristics of wind turbines. Starting with the uncertainties of wind power curves, a confidence band is presented under test conditions [17].
For the problems mentioned above, modified approach of MPC based on improved Bins and K-means++ clustering is proposed in this paper, and the power curve obtained by this method can reflect the output characteristics of wind turbines in complex environments effectively. This paper is organized as follows. Section 2 reviews two representation methods of wind speed-power and data preprocessing. Section 3 describes the proposed modified approach based on improved Bins and K-means++ clustering. Section 4 presents the performance comparison based on simulated and real data of a wind farm. Section 5 concludes the paper. For the problems mentioned above, modified approach of MPC based on improved Bins and K-means++ clustering is proposed in this paper, and the power curve obtained by this method can reflect the output characteristics of wind turbines in complex environments effectively. This paper is organized as follows. Section 2 reviews two representation methods of wind speed-power and data preprocessing. Section 3 describes the proposed modified approach based on improved Bins and K-means++ clustering. Section 4 presents the performance comparison based on simulated and real data of a wind farm. Section 5 concludes the paper.

Two Representation Methods and Data Preprocessing
According to mathematical statistics, there are two kinds of relations between random variables: functional relation and correlation. Similarly, the relations between wind speed and output of the wind turbine are displayed in two ways. The one is the functional relation expressed by power curve, e.g., MPC ( Figure 1). The other is the correlation showed by scatter plot (Figure 2).

Functional Relation-Power Curve
According to the standard IEC 61400-12 [18], the wind speed-power characteristics of wind turbines can be expressed by wind speed-power curves. The curve reflects the relations between wind speed and the mean value of output in the time scale of 10 min, i.e., the mean value in 10 min is used as the object of data analysis to express the relations between them. This is a functional relation that represents the correspondence between wind speed and output. The power curve is shown in Figure 1.
As can be seen from Figure 1, a definite power value corresponds to each wind speed, i.e., the wind speed has a one-to-one correspondence with the output of the wind turbine. The cut-in and cut-out wind speed are 4 and 25 m/s, respectively. When the wind speed is between the two values, the output power of the wind turbine can be obtained according to the corresponding relations.

Two Representation Methods and Data Preprocessing
According to mathematical statistics, there are two kinds of relations between random variables: functional relation and correlation. Similarly, the relations between wind speed and output of the wind turbine are displayed in two ways. The one is the functional relation expressed by power curve, e.g., MPC (Figure 1). The other is the correlation showed by scatter plot (Figure 2).

Functional Relation-Power Curve
According to the standard IEC 61400-12 [18], the wind speed-power characteristics of wind turbines can be expressed by wind speed-power curves. The curve reflects the relations between wind speed and the mean value of output in the time scale of 10 min, i.e., the mean value in 10 min is used as the object of data analysis to express the relations between them. This is a functional relation that represents the correspondence between wind speed and output. The power curve is shown in Figure 1.
As can be seen from Figure 1, a definite power value corresponds to each wind speed, i.e., the wind speed has a one-to-one correspondence with the output of the wind turbine. The cut-in and cut-out wind speed are 4 and 25 m/s, respectively. When the wind speed is between the two values, the output power of the wind turbine can be obtained according to the corresponding relations.

Correlation-Scatter Plot
Compared with the power curve, it is more practical to demonstrate the wind speedpower relationship by actual sampling data pairs (v i , P i ) of wind turbines. As shown in Figure 2, it is a wind speed-power scatter plot with a time span of 3 months for a certain wind turbine at a sampling interval of 10 min.
From the sampled data, it can be seen that the wind speed-power relationship is not strictly following the power curve proposed by manufacturer based on actual engineering. Under a certain wind speed, the power value fluctuates in a wide range, i.e., the output is not unique, but a correlation.

Data Preprocessing
Because the data used to draw the scatter plot is sampled from wind turbines which are connected to the grid, the scatter plot is more authentic than the MPC in representing the operation characteristics of wind turbines. Therefore, the method adopted in this paper is also based on the actual sampling data of wind turbines. In order to understand the relationship between wind speed and output deeply, Copula theory is used. The nonparametric method (empirical distribution function (EDF) and kernel estimation method (KEM)) is used to analyze the sampled data, and then determine the overall distribution. The empirical distribution function diagram (EDFD) and kernel estimation diagram (KED) of wind speed and power are shown in Figure 3. Figure 2, it is a wind speed-power scatter plot with a time span of 3 months for a certain wind turbine at a sampling interval of 10 min.
From the sampled data, it can be seen that the wind speed-power relationship is not strictly following the power curve proposed by manufacturer based on actual engineering. Under a certain wind speed, the power value fluctuates in a wide range, i.e., the output is not unique, but a correlation.

Data Preprocessing
Because the data used to draw the scatter plot is sampled from wind turbines which are connected to the grid, the scatter plot is more authentic than the MPC in representing the operation characteristics of wind turbines. Therefore, the method adopted in this paper is also based on the actual sampling data of wind turbines. In order to understand the relationship between wind speed and output deeply, Copula theory is used. The nonparametric method (empirical distribution function (EDF) and kernel estimation method (KEM)) is used to analyze the sampled data, and then determine the overall distribution. The empirical distribution function diagram (EDFD) and kernel estimation diagram (KED) of wind speed and power are shown in Figure 3. It can be seen from Figure 3 that the EDFD of wind speed and power almost coincide with the KED. Figure 4a shows the distribution density, and the binary normal Copula function Equation (1) and binary t-Copula function Equation (2) are selected to describe the correlation structure of the original data. It can be seen from Figure 3 that the EDFD of wind speed and power almost coincide with the KED. Figure 4a shows the distribution density, and the binary normal Copula function Equation (1) and binary t-Copula function Equation (2) are selected to describe the correlation structure of the original data.
where C G is the binary normal Copula function and C t is the binary t-Copula function; Φ −1 is the inverse function of standard normal distribution function; t k −1 is the inverse function of a one-dimensional t-distribution function with k degrees of freedom; ρ is a second-order symmetric positive definite matrix with diagonal elements of 1.
where CG is the binary normal Copula function and Ct is the binary t-Copula function; Φ −1 is the inverse function of standard normal distribution function; tk −1 is the inverse function of a one-dimensional t-distribution function with k degrees of freedom; ρ is a second-order symmetric positive definite matrix with diagonal elements of 1. In this paper, the rank correlation coefficients by Kendall and Spearman are used to study the correlation between wind speed and output. The correlation coefficient between wind speed and power is calculated based on the Copula function as shown in Table 1. It can be seen from the calculation results that the correlation coefficients (Kendall and Spearman) between wind speed and power of the original sampling data are all below 0.65, meaning the correlation is weak. On the other hand, not all sampling points in the original sampling data set are reasonable and effective, such as the data points in the red circle in Figure 2. Regard such data points as sampling outliers, and using the method proposed in [10] to clean up them. Then, for the preprocessed data, the Copula theory is used again to calculate the correlation coefficient. The calculation results are shown in Figure 4b and Table 1. It can be seen that after data pretreatment, the correlation of measured data is improved. Compared with the original sample data, it is more reasonable to analyze the processed data, and the distribution density map shrinks to a smaller range, i.e., the data is more concentrated. Therefore, preprocessed data is selected for analysis in this paper.

K-Means++ Clustering
In the process of solving practical engineering problems, clustering analysis is the process of grouping sampled samples into several classes composed of similar objects. Cluster analysis can classify characteristics of research objects or indicators by considering various factors, and according to the comprehensive properties of each sample, cluster In this paper, the rank correlation coefficients by Kendall and Spearman are used to study the correlation between wind speed and output. The correlation coefficient between wind speed and power is calculated based on the Copula function as shown in Table 1. It can be seen from the calculation results that the correlation coefficients (Kendall and Spearman) between wind speed and power of the original sampling data are all below 0.65, meaning the correlation is weak. On the other hand, not all sampling points in the original sampling data set are reasonable and effective, such as the data points in the red circle in Figure 2. Regard such data points as sampling outliers, and using the method proposed in [10] to clean up them. Then, for the preprocessed data, the Copula theory is used again to calculate the correlation coefficient. The calculation results are shown in Figure 4b and Table 1. It can be seen that after data pretreatment, the correlation of measured data is improved. Compared with the original sample data, it is more reasonable to analyze the processed data, and the distribution density map shrinks to a smaller range, i.e., the data is more concentrated. Therefore, preprocessed data is selected for analysis in this paper.

K-Means++ Clustering
In the process of solving practical engineering problems, clustering analysis is the process of grouping sampled samples into several classes composed of similar objects. Cluster analysis can classify characteristics of research objects or indicators by considering various factors, and according to the comprehensive properties of each sample, cluster analysis is completed. K-means++ [19][20][21] is a kind of "hard clustering" algorithm used in clustering analysis widely. Its algorithm idea and process are as follows.
Algorithmic idea: Assuming that m (0 < m < K) initial clustering centers have been selected, the point further away from the current m cluster centers has a higher probability of being selected as the m + 1th cluster center.
Algorithm process: Step 1: A sample is selected from the sampled data set as the initial cluster center T 1 randomly.
Step 2: The shortest distance D(x) between each sample and initial cluster center T 1 is calculated.
Step 3: The probability p(x) of each sample being selected as the next cluster center is calculated. And the calculation information table of K-means++ as shown in Table 2.

Number of Samples D (x) D (x) 2 p (x) Sum
Step 4: The next cluster center is obtained according to the roulette wheel method.
Step 5: Repeat Step 2 to Step 4 until all K clustering centers are selected.
Step 6: Calculate the distance from each sample to K clustering centers, select the nearest clustering center, and classify it into this category.
Step 8: Repeat Step 6 and Step 7 until there is no change in the class center.

Improved Bins
As mentioned above, the MPC is obtained based on the Bins method provided by IEC 61400-12. The Bins method takes wind speed as one of the research objects and divides the fluctuation range of wind speed into equal intervals, each of which is called a Bin. In the standard, the dividing interval of wind speed is 0.5 m/s, i.e., Bin size is 0.5, and all values of wind speed are integral multiple of 0.5 at each interval. Accordingly, the sampling wind speed interval can be divided into n Bins. On this basis, the average of wind speed and power in each Bin are calculated by Equations (5) and (6).
where v iav and p iav denote the average of wind speed and power in ith Bin, respectively; n i indicates the number of data in the ith Bin; (v ij , p ij ) represents the wind speed and power of the jth sampling point in the ith Bin. Using the measured data, a total of n two-dimensional data pairs (v iav , p iav ) corresponding to each Bin are calculated by Equations (5) and (6). On this basis, the power curve based on the measured data and Bins method is obtained by interpolation method. The curve obtained by the above Bins method is limited by the value of Bin, the result obtained will be significantly different with the value of Bin changes, and no data is available in individual Bin, i.e., the Bin is empty. Therefore, when the curve obtained by this method is used to analyze the operating characteristics of the wind turbine, it will inevitably bring large errors.
To solve the above problems, a correction method based on improved Bins and K-means++ clustering is proposed to modify the power curve. To ensure that there is sufficient sampling data in each Bin, on the basis of pretreatment of sampling data, the wind speed is divided into equal sampling points, i.e., equal frequency division. Each wind speed interval is denoted as one Bin, the entire wind speed range is divided into n Bins, and each Bin has the same sampling data.
On this basis, clustering analysis is carried out for the sampled data in each Bin. In this paper, K-means++ clustering is adopted to process the data in Bins, and then, the weighted value of each category is calculated according to the weighted method.
where (v ij , P ij ) represents the average wind speed and power of class j in Bin i. n ij represents the data volume of the jth class in the ith Bin; (v ij (n), P ij (n)) express the wind speed and power of the nth data point of the jth class in the ith Bin. (v i , P i ) represents the equivalent power and wind speed in the ith Bin; K is the number of categories for clustering analysis. The n i data points used to express the performance of wind turbine are calculated by the above method. Then, the power curve is obtained by regression analysis. The curve optimization process is shown in the following figure ( Figure 5). able in individual Bin, i.e., the Bin is empty. Therefore, when the curve obtained by this method is used to analyze the operating characteristics of the wind turbine, it will inevitably bring large errors.
To solve the above problems, a correction method based on improved Bins and K-means++ clustering is proposed to modify the power curve. To ensure that there is sufficient sampling data in each Bin, on the basis of pretreatment of sampling data, the wind speed is divided into equal sampling points, i.e., equal frequency division. Each wind speed interval is denoted as one Bin, the entire wind speed range is divided into n Bins, and each Bin has the same sampling data.
On this basis, clustering analysis is carried out for the sampled data in each Bin. In this paper, K-means++ clustering is adopted to process the data in Bins, and then, the weighted value of each category is calculated according to the weighted method. The ni data points used to express the performance of wind turbine are calculated by the above method. Then, the power curve is obtained by regression analysis. The curve optimization process is shown in the following figure ( Figure 5).

Sample Data
The wind turbine of a wind farm in Heilongjiang is selected randomly to verify the effectiveness of the proposed method. Meanwhile, the differences between the curve drawn by this method and MPC are compared and analyzed. The data are sampled in the SCADA system, and the sampling time and interval is 1 January 2017 solstice to 30 September 2017 and 10 min. The visualization effect of fluctuation and distribution of sampled data is as follows (Figures 6 and 7).

Sample Data
The wind turbine of a wind farm in Heilongjiang is selected randomly to verify the effectiveness of the proposed method. Meanwhile, the differences between the curve drawn by this method and MPC are compared and analyzed. The data are sampled in the SCADA system, and the sampling time and interval is 1 January 2017 solstice to 30 September 2017 and 10 min. The visualization effect of fluctuation and distribution of sampled data is as follows (Figures 6 and 7).

Sample Data
The wind turbine of a wind farm in Heilongjiang is selected randomly to verify the effectiveness of the proposed method. Meanwhile, the differences between the curve drawn by this method and MPC are compared and analyzed. The data are sampled in the SCADA system, and the sampling time and interval is 1 January 2017 solstice to 30 September 2017 and 10 min. The visualization effect of fluctuation and distribution of sampled data is as follows (Figures 6 and 7).

Data Partitioning Based on Improved Bins
The raw data is preprocessed and wind speed frequency is plotted in a histogram ( Figure 8). The preprocessed data are then divided by equal frequency using the improved Bins method proposed above. The first-order differential of each Bin is drawn in Figure 9.

Data Partitioning Based on Improved Bins
The raw data is preprocessed and wind speed frequency is plotted in a histogram (Figure 8).The preprocessed data are then divided by equal frequency using the improved Bins method proposed above. The first-order differential of each Bin is drawn in Figure 9.  As can be seen from the frequency histogram (Figure 8), the data size in each Bin is different especially in Bins at both ends. The improved Bins are equal-frequency division, i.e., the data amount in each Bin is equal. And in Figure 9 the first-order differential at both ends is larger than that in the middle, this phenomenon is caused by the different concentration of data in different wind sections. On the other hand, according to the mathematical statistics, it is more reasonable to study a certain problem by using different sample that has equal data. Therefore, the improved Bins method proposed in this paper is more reasonable.

Data Partitioning Based on Improved Bins
The raw data is preprocessed and wind speed frequency is plotted in a histogram ( Figure 8).The preprocessed data are then divided by equal frequency using the improved Bins method proposed above. The first-order differential of each Bin is drawn in Figure 9.  As can be seen from the frequency histogram (Figure 8), the data size in each Bin is different especially in Bins at both ends. The improved Bins are equal-frequency division, i.e., the data amount in each Bin is equal. And in Figure 9 the first-order differential at both ends is larger than that in the middle, this phenomenon is caused by the different concentration of data in different wind sections. On the other hand, according to the mathematical statistics, it is more reasonable to study a certain problem by using different sample that has equal data. Therefore, the improved Bins method proposed in this paper is more reasonable. As can be seen from the frequency histogram (Figure 8), the data size in each Bin is different especially in Bins at both ends. The improved Bins are equal-frequency division, i.e., the data amount in each Bin is equal. And in Figure 9 the first-order differential at both ends is larger than that in the middle, this phenomenon is caused by the different concentration of data in different wind sections. On the other hand, according to the mathematical statistics, it is more reasonable to study a certain problem by using different sample that has equal data. Therefore, the improved Bins method proposed in this paper is more reasonable.

Curve Correction of MPC
After the equal-frequency division of sampling data, K-means++ clustering algorithm is used to cluster the data in each Bin. Then, the data in each Bin are weighted by Equations (7)-(10), and the results are shown in Table 3. Curve fitting based on the above data and compare the result with MPC in Figure 10.

Curve Correction of MPC
After the equal-frequency division of sampling data, K-means++ clustering algorithm is used to cluster the data in each Bin. Then, the data in each Bin are weighted by Equations (7)-(10), and the results are shown in Table 3. Curve fitting based on the above data and compare the result with MPC in Figure 10. As can be seen from the above diagram, the modified curve (MMPC) is different from that of MPC. When the wind speed is within [5 m/s, 10 m/s], the value of MMPC is greater than that of MPC, i.e., MMPC is above MPC. On the contrary, within the range [10 m/s, 15 m/s], MMPC is below the MPC. Since the MMPC is based on the measured data of wind turbine, it means that within the range [5 m/s, 10 m/s], the actual output is greater than the expected value. Within the range [10 m/s, 15 m/s], the actual output is less than the expected production value. Due to different results in different intervals, MMPC and MPC As can be seen from the above diagram, the modified curve (MMPC) is different from that of MPC. When the wind speed is within [5 m/s, 10 m/s], the value of MMPC is greater than that of MPC, i.e., MMPC is above MPC. On the contrary, within the range [10 m/s, 15 m/s], MMPC is below the MPC. Since the MMPC is based on the measured data of wind turbine, it means that within the range [5 m/s, 10 m/s], the actual output is greater than the expected value. Within the range [10 m/s, 15 m/s], the actual output is less than the expected production value. Due to different results in different intervals, MMPC and MPC are quantitatively analyzed to compare the ability of both reflecting the real situation, as detailed below.

Contrastive Analysis
In the intervals of [ MMPC and MPC are used to calculate daily power generation within 31 days. The comparison chart of wind power output is shown in Figure 11. The quantity of electricity is expressed in Figure 12 and Table 4.

Contrastive Analysis
In the intervals of [5 m/s, 10 m/s] and [10 m/s, 15 m/s], the comparison results of MMPC and MPC are different (Figure 10), and the probabilities within the two intervals are different either, i.e., the data proportions are different. Under this scenario, the quantity of electricity calculated by MPC and MMPC are bound to be distinct. The two curves are compared and analyzed as follows.
MMPC and MPC are used to calculate daily power generation within 31 days. The comparison chart of wind power output is shown in Figure 11. The quantity of electricity is expressed in Figure 12 and Table 4.

Contrastive Analysis
In the intervals of [5 m/s, 10 m/s] and [10 m/s, 15 m/s], the comparison results of MMPC and MPC are different (Figure 10), and the probabilities within the two intervals are different either, i.e., the data proportions are different. Under this scenario, the quantity of electricity calculated by MPC and MMPC are bound to be distinct. The two curves are compared and analyzed as follows.
MMPC and MPC are used to calculate daily power generation within 31 days. The comparison chart of wind power output is shown in Figure 11. The quantity of electricity is expressed in Figure 12 and Table 4.  Where AV represents the actual value; MD denotes the deviation between the calculated value based on MPC and the actual value. MMD represents the deviation between the calculated value based on MMPC and the actual value. It can be seen from Table 4 that the modified result is all less than the value calculated by MPC, i.e., the result value is more accurate, and it can better reflect the actual operation of wind turbine.

Conclusions
The MPC of wind turbines is affected by many factors due to the complexity of the operating environment, e.g., the meteorological and environmental conditions at the installation of wind turbines, the operating state and operating life of wind turbines, etc., which make the relationship between the output of wind turbines and wind speed not strictly in accordance with MPC. Focus on the problem, based on the pretreatment of wind speed-power measured data of wind turbines sampled in engineering practice, a modified method is established based on improved Bin and K-means++ clustering in this paper. MMPC is obtained by this method, and the effectiveness of the proposed method is verified by an example analysis. The method proposed in this paper is helpful to express the output characteristics of wind turbines effectively connected to the network, and provides a certain reference for the manufacturers and management enterprises to have a deeper understanding of the operating characteristics of wind turbines. However, this method also has some limitations: due to the difference of wind turbine data sampling intervals, it will have a certain impact on the use of the method, which will be the content to be studied later.