Improved Weighted k -Nearest Neighbor Based on PSO for Wind Power System State Recognition

: In this paper, we propose using particle swarm optimization (PSO) which can improve weighted k -nearest neighbors (PWKNN) to diagnose the failure of a wind power system. PWKNN adjusts weight to correctly reﬂect the importance of features and uses the distance judgment strategy to ﬁgure out the identical probability of multi-label classiﬁcation. The PSO optimizes the weight and parameter k of PWKNN. This testing is based on four classiﬁed conditions of the 300 W wind generator which include healthy, loss of lubrication in the gearbox, angular misaligned rotor, and bearing fault. Current signals are used to measure the conditions. This testing tends to establish a feature database that makes up or trains classiﬁers through feature extraction. Not lowering the classiﬁcation accuracy, the correlation coe ﬃ cient of feature selection is applied to eliminate irrelevant features and to diminish the runtime of classiﬁers. A comparison with other traditional classiﬁers, i.e., backpropagation neural network (BPNN), k -nearest neighbor ( k -NN), and radial basis function network (RBFN) shows that PWKNN has a higher classiﬁcation accuracy. The feature selection can diminish the average features from 16 to 2.8 and can reduce the runtime by 61%. This testing can classify these four conditions accurately without being a ﬀ ected by noise and it can reach an accuracy of 83% in the condition of signal-to-noise ratio (SNR) is 20dB. The results show that the PWKNN approach is capable of diagnosing the failure of a wind power system.


Introduction
With the emergence of green energy, wind power plays a principle role in energy. However, wind power is a form of intermittent energy without a stable output. In addition, the malfunction of wind generators is a great concern. Downtime leads to heavy cost losses in peak-wind seasons [1][2][3]. Furthermore, a large number of wind generators in wind farms increases the probability of failure. Without sufficient personnel, an automatic inspection of the wind generator is indispensable [4]. A central system is employed to monitor the operation of wind generators, which can effectively reduce personnel cost. According to [5,6], the most common fault condition was bearing damage, accounting for 45%; followed by stator failure, accounting for 35%; and finally, rotor failure, accounting for 10%; and other damages are 10%. When the motor fails, the running cost increases. Therefore, fault identification has always been one of the topics in industrial applications. Generally, the automatic identification system includes the following three parts: the signal analysis, feature extraction, and condition classification. The signal analysis has been carefully researched in the past, in which the feature extraction and condition classification are crucial factors [7,8]. Feature extraction involves extracting specific and representative features, providing few data to classifiers in classification.
However, the features extracted are not always the effective ones. The valid features are likely to become invalid features if the signals include noise. This research emphasizes how to gain effective features and raise the classification accuracy [9]. Artificial neural network (ANN) has been widely used in different fields, such as machine operation control, automatic fault detection, and image recognition [6,[10][11][12][13][14]. Since the first neural network was proposed, different types of networks have been developed one after another, such as feedforward neural network, probabilistic neural network (PNN) [10,11], and backpropagation neural network (BPNN) [6,[12][13][14]. The use of different classifiers for various types of issues have different performance, architecture, and classification effects. BPNN is a supervised learning network, and its recognition ability is higher than the unsupervised learning network. Furthermore, classifiers offer a method of automatic classification. A well-organized classifier can provide a higher classification accuracy, which takes less time for computing, increases computing efficiency, and favors a large quantity, real-time system.
There have been considerable studies on the signal analysis of rotating machines, of which the vibration signal analysis was the focus [1,15]. Vibration signals can show the machine operation clearly. While measuring vibration signals, an accelerometer is a compulsory supplement. Additionally, one accelerometer can merely measure a single position's vibration. Recently, more and more researches have used the fault diagnosis of electric currents and voltage [16][17][18]. Electric currents and voltage can lessen the number and cost of sensors, complexity of measurement system, and failure rate.
The signal analysis can detect the unusual signals; the traditional signal analysis is Fourier transforms (FT) [19] which cannot obtain transient messages. However, rotating machines often have intermittent messages. A multi-resolution analysis (MRA) can be a good method of analyzing unusual signals [20]. Features extracted from signals in automatic classification can reduce the input number of classifiers and computing time. Feature extraction probably includes invalid features and its number affects computing. In addition, invalid features lower the classification accuracy of classifiers [21][22][23].
Although k-nearest neighbor (k-NN) is a simple and fast classifier, its procedure and structure need some improvement. The greedy method, used in [24], is to adjust k-NN with weight features. In [25], the k parameters should be adjusted according to different categories of training sets. Additionally, the setting of parameter k and identical probability of classification remain to be solved.
In this paper, we propose one approach to recognize the operational conditions of power generators. The approach is based on structure and consists of the following two parts: First, the feature extraction technique is used, where the wavelet transform method extracts the features from the 300 W power system. Second, the correlation coefficients are used to optimize the feature set. Finally, the proposed method is used to recognize the operation of the wind power system. The proposed method combines k-NN and PSO which can improve the accuracy, namely PWKNN. Experiments show that the performance of the PWKNN model is better than the other classification methods, and the average classification accuracy of PWKNN is higher than that of other classifiers. This paper is organized as follows: In Section 2, we introduce the signal analysis and feature extraction approach; in Section 3, we present the procedure and structure of improved weight k-NN based on particle swarm optimization (PWKNN); in Section 4, we discuss the types of malfunctions of a wind power system, the feature selection; in Section 5, we present the simulation results; and in Section 6, we provide the conclusions.

Wavelet Transform
The French geophysicist Jean Morlet invented the wavelet transform for analyzing local properties of seismic waves [26]. He found that the fast Fourier transform (FFT) could not meet the requirements, therefore, he introduced a wavelet into the signal analysis and decomposed signals. The wavelet transform of signals can be observed at different resolutions. Therefore, local properties can be examined effectively. It is a signal transform analysis between time and frequency domains.

Continuous Wavelet Transform
The scale parameter and translation parameter can adjust the mother wavelet ψ(t) shown in Equation (1), where a is the scale parameter and b is the translation parameter.
The function f (t) of continuous wavelet transform is defined in Equation (2), where the asterisk represents the operation of complex conjugate.

Multi-Resolution Analysis
The function f (t) is decomposed by the scaling function and the mother wavelet function through the scaling change and wavelet transform. The scaling function is considered to be a low-pass filter and the mother wavelet function is considered to be a high-pass filter. The structure shown in Figure 1 decomposes the signal f (t) into a high-and low-frequency signal. The low-frequency signal is decomposed for the second time and the decomposed signals, d 1 , . . . , d n and a n can be yielded as follows: The function ( ) of continuous wavelet transform is defined in Equation (2), where the asteris presents the operation of complex conjugate.

Multi-Resolution Analysis
The function ( ) is decomposed by the scaling function and the mother wavelet functio rough the scaling change and wavelet transform. The scaling function is considered to be a low ss filter and the mother wavelet function is considered to be a high-pass filter. The structure show Figure 1 decomposes the signal ( ) into a high-and low-frequency signal. The low-frequenc nal is decomposed for the second time and the decomposed signals, 1 , … , and can b elded as follows: The procedure mentioned above is an MRA [26]. Through the viewpoint of spectral spectrum e MRA can process the detailed analysis of low-frequency signals shown in Figure 2. Therefor ore decomposition implies higher resolution.  The procedure mentioned above is an MRA [26]. Through the viewpoint of spectral spectrum, the MRA can process the detailed analysis of low-frequency signals shown in Figure 2. Therefore, more decomposition implies higher resolution.

Continuous Wavelet Transform
The scale parameter and translation parameter can adjust the mother wavelet ( ) shown in Equation (1), where is the scale parameter and b is the translation parameter.
The function ( ) of continuous wavelet transform is defined in Equation (2), where the asterisk represents the operation of complex conjugate.

Multi-Resolution Analysis
The function ( ) is decomposed by the scaling function and the mother wavelet function through the scaling change and wavelet transform. The scaling function is considered to be a lowpass filter and the mother wavelet function is considered to be a high-pass filter. The structure shown in Figure 1 decomposes the signal ( ) into a high-and low-frequency signal. The low-frequency signal is decomposed for the second time and the decomposed signals, 1 , … , and can be yielded as follows: The procedure mentioned above is an MRA [26]. Through the viewpoint of spectral spectrum, the MRA can process the detailed analysis of low-frequency signals shown in Figure 2. Therefore, more decomposition implies higher resolution.

Feature Extraction
Scales of wavelet coefficients can show the signals in Equation (3). By using Parseval's theorem, Equation (3) can be reconstructed as follows:

Feature Extraction
Scales of wavelet coefficients can show the signals in Equation (3). By using Parseval's theorem, Equation (3) can be reconstructed as follows: The first component refers to the average power of a signal. The latter component is the sum of the average power of coefficients after wavelet transformation. The simplified formula is shown as follows: The energy spectrum indicates the energy for each frequency band of signal. The different distributions of energy spectrum can easily show operational conditions of wind generator. Hence, the energy spectrums are considered to be the features in this paper.

PWKNN
The k-NN is a supervised learning method and also among the simplest of machine learning algorithms [27]. The method classifies an unknown sample class according to the nearest k samples of training datasets. The nearest k samples are called k-nearest neighbors which can be obtained through the distance computation. The class having the maximum number is considered to be the unknown category. The classification decision is as follows: The equation means that the predictor will be the class that has the largest number of members in the k-nearest neighbors. For example, the left part of Figure 3 indicates three classes of samples (square, circle, and triangle) and an unknown sample (rhombus). The nearest four samples are selected. The numeric means the order of the distance 1 is the nearest and 4 is the farthest. The figure shows that Class A is a strong probability. Therefore, the unknown sample is classified to Class A.

=0
The energy spectrum indicates the energy for each frequency band of signal. The different distributions of energy spectrum can easily show operational conditions of wind generator. Hence, the energy spectrums are considered to be the features in this paper.

PWKNN
The k-NN is a supervised learning method and also among the simplest of machine learning algorithms [27]. The method classifies an unknown sample class according to the nearest k samples of training datasets. The nearest k samples are called k-nearest neighbors which can be obtained through the distance computation. The class having the maximum number is considered to be the unknown category. The classification decision is as follows: The equation means that the predictor will be the class that has the largest number of members in the k-nearest neighbors. For example, the left part of Figure 3 indicates three classes of samples (square, circle, and triangle) and an unknown sample (rhombus). The nearest four samples are selected. The numeric means the order of the distance 1 is the nearest and 4 is the farthest. The figure shows that Class A is a strong probability. Therefore, the unknown sample is classified to Class A.
K-NN employs a distance measure, usually Euclidean distance, to show the similarity. If = ( 1 , 2 , . . . , ) and = ( 1 , 2 , . . . , ) are n-dimensions vectors, the following shows how Euclidean distance computes: However, k-NN has some weakness as follows: (1) All feature weights are equal. Not all features to the classified results have the positive correlation. The invalid features lead to the error classified results. (2) Identical classified probabilities probably occur. As shown on the left side of the Figure 3, if a nearest neighbor is added to the classification decisions, the probabilities of Class A, Class B, and Class C are 40%, 40%, and 20%, respectively. Class A and Class B have the same probability. Refers to the k-NN weakness (1), the classification result of k-NN is sensitive to features. Invalid features reduce the classification accuracy. The weighted k-NN (WKNN) is implemented to modify  K-NN employs a distance measure, usually Euclidean distance, to show the similarity. If X = (x 1 , x 2 , . . . , x n ) and Y = (y 1 , y 2 , . . . , y n ) are n-dimensions vectors, the following shows how Euclidean distance computes: However, k-NN has some weakness as follows: (1) All feature weights are equal. Not all features to the classified results have the positive correlation. The invalid features lead to the error classified results. (2) Identical classified probabilities probably occur. As shown on the left side of the Figure 3, if a nearest neighbor is added to the classification decisions, the probabilities of Class A, Class B, and Class C are 40%, 40%, and 20%, respectively. Class A and Class B have the same probability.
Refers to the k-NN weakness (1), the classification result of k-NN is sensitive to features. Invalid features reduce the classification accuracy. The weighted k-NN (WKNN) is implemented to modify this drawback. Different weights are used according to different features. The weight, W = [w 1 , w 2 , . . . , w n ], is taken into account in (x) to compute the weighted distance as follows: From Figure 3, the weight is W = [1,1] in the left diagram and the weight is W = [1, 0.5] in the right diagram. The equation mentioned above shows that the weight change can influence the classification results. Because, initially, Class A is a high probability, the probability of Class B will be stronger than that of Class A after the changed weight. Feature 2 has less effect on the classification results. The weight adjustment can alter the effect of different features on classification results and eliminate certain invalid features to raise the classification accuracy.
Concerning the classified result of k-NN producing identical probability, this research suggests the distance decision to determine the classification. In the left diagram of Figure 3, Class A and Class B have the identical probability. The method computes the distance from Class A and Class B to each two test points of their k-nearest neighbors. The distances from Class A to its test points of k-nearest neighbors are d1 and d3. The distances from Class B to its test points of k-nearest neighbors are d4 and d5. If D is the decision result, the following shows that the sum of shorter distances to the class is considered as the predictor in classification: This method can overcome the identical probability and enhance the correctness of classification. PWKNN is a classifier of upgrading weighted k-nearest neighbors based on particle swarm optimization (PSO). The PSO optimizes weights and k value and estimates the predictive classification accuracy of PWKNN with leave one out cross-validation (LOOCV). In the LOOCV, each sample data is a class. One class is a testing sample in each computation, others are training samples. All classes are measured, and the sum of classified accuracy are computed as predictive classification accuracy pca CV . If the number of the correct classification is N Correctly and the number of total training samples is N Total , pca CV is shown as follows:

Particle Swarm Optimization (PSO)
PSO is a computational method on optimizing a problem presented by R. Eberhart and J. Kennedy [28]. PSO has become a popular optimization algorithm and widely used in practical problem solving by researchers [29][30][31]. PSO generates particles randomly from the search space.
Energies 2020, 13, 5520 6 of 16 The movement of a particle is influenced by its best position and better positions founded by other particles, which optimizes a solution in the space. The optimal result is that all particles are found in the same position of the search space. This position is the best solution.
Unlike the genetic algorithm (GA), the procedure of PSO works according to a few simple formula and conceptions [28,32]. The object function or fitness function Fit(•) must be maximized. To avoid the solution converging toward the local best, the suitable number of particles is set, and the position is initialized randomly. Then, the velocity of particle is determined by: In addition, the right choice of inertial weight provides a balance between global exploration and local exploitation. In general, the inertial weight, w, can be set to change depending on the current number of iterations, t, as follows: The equation is influenced by the following three factors: the previous particle's velocity v i (t), self-best position of particle pbest i and the all particle's best position gbest i . Next, update the particle's position, shown as follows: x Additionally, the testing judges whether or not the output of fitness function is optimal to replace the particle self-best position. If the output is greater than the previous one, the new position replaces the particle's best position. If not, the best position of particle remains in the same position. The equation is shown as follows: Moreover, the best position of all particles can be found to select the greatest output of fitness function. Finally, the position is considered to be the best solution, and shown in the following equation: The equation of particle movement is to decide the ability of search solution and convergence. The formula mentioned above is commonly used after improvement.
To ensure exploration ability at the start of the algorithm, and then to ensure exploitation ability in the search area. Therefore, the inertia weight factor in the studies is adjusted to linearly decrease. The values w max and w min are the upper and lower limits of the inertial weight and are set at 0.9 and 0.4 in (12), respectively. The maximum number of iterations, T, is set in 1000. Additionally, by adjusting the learning coefficients ϕ 1 and ϕ 2 , are set as 2.0, the particle self-best position and all particle's best positions can influence particle movement. The random number is to increase the interference and avoid the solution getting stuck into local best in the optimization process [33].
In this paper, weight W = [w 1 , w 2 , . . . , w n ] and k value are taken for particle position x i = [w 1 , w 2 , . . . , w n , k]. The predictive accuracy of PWKNN P CV is regarded as the fitness function Fit(x i ) = P CV . The optimization process is shown in Figure 4. A set of optimum parameters [w 1 , w 2 , . . . , w n , k] best can generate after PSO optimization, which PWKNN can use to classify categories to gain the highest classified correction accuracy.
Energies 2020, 13, 5520 7 of 16 2 , . . . , , ] . The predictive accuracy of PWKNN is regarded as the fitness fun ) = . The optimization process is shown in Figure 4. A set of optimum param 2 , . . . , , ] can generate after PSO optimization, which PWKNN can use to cla ories to gain the highest classified correction accuracy.

Feature Selection
Wavelet coefficients of different features can be taken for the features of generator operation and the input of classifier computing. In addition, correlation coefficients are used to select features to reach optimization and reduce computation. The formula of correlation coefficient is shown as follows: This paper determines the feature selection according to correlation coefficients. The predictive classification accuracy of PWKNN is considered to be the stop criterion when all features are selected. Features with minor correlation coefficients are removed first until none is left out. The flowchart of feature selection is shown in Figure 5. The following is the procedure of the feature selection: Step (1) Suppose the number of all features is N, select all features and use the cross one out cross-validation (LOOCV) to compute the classification accuracy pca max,N CV of PWKNN, which is the criterion of raising the classification accuracy. W N is the weight after the PWKNN optimization.

Step (2)
After removing the lowest correlation coefficient, the feature number is N − 1. Use W N to recompute the classification accuracy pca CV .

Step (3)
If pca CV ≥ pca max,N  is the criterion of raising the classification accuracy.
is the weight after the PWKNN optimization.
Step (2) After removing the lowest correlation coefficient, the feature number is − 1. Use to recompute the classification accuracy .
Step ( Step (4) If , −1 ≥ , , go back to Step 2. Record −1 and this set of features is optimum; otherwise, come to an end of this procedure.

Types of Malfunctions of a Wind Power System
This testing employs a wind power simulation platform, load, data acquisition (DAQ), and personal computer, as shown in Figure 6, as well as a 300 W rated three-phase generator and gearbox with ratio of 1 to 10 driven by a drive motor to simulate wind power. The output of wind power generators links to the load. The load of wye connected single phase is 20 Ω. One phase of current signals can be derived from the NI-PXI 1033 chassis and NI-PXI 4071 digital multimeter (manufactured by National Instruments, Austin, Texas, USA) to the personal computer. The derived signal is segmented with proper length. This research is based on the time of 0.5 s as the length of segment, shown in Figure 7. Finally, we use the computer to analyze and automatically detect the signals.

Types of Malfunctions of a Wind Power System
This testing employs a wind power simulation platform, load, data acquisition (DAQ), and personal computer, as shown in Figure 6, as well as a 300 W rated three-phase generator and gearbox with ratio of 1 to 10 driven by a drive motor to simulate wind power. The output of wind power generators links to the load. The load of wye connected single phase is 20 Ω. One phase of current signals can be derived from the NI-PXI 1033 chassis and NI-PXI 4071 digital multimeter (manufactured by National Instruments, Austin, Texas, USA) to the personal computer. The derived signal is segmented with proper length. This research is based on the time of 0.5 s as the length of segment, shown in Figure 7. Finally, we use the computer to analyze and automatically detect the signals.

Loss of Lubrication in the Gearbox
Wind power simulation platform PC DAQ

Loss of Lubrication in the Gearbox
Lubrication leakage leading to malfunction occurs when a gearbox leaks. Usually, the gearbox is filled with eight-tenths lubrication. However, the vibration of power generators is more serious in real scenarios. Vibration is likely to cause the leak valve to loosen and leakage of lubrication. A gearbox is damaged by deficient lubrication for long periods. This research is to examine the leakage of an empty gearbox causing malfunction. The output current of the power generator is shown in Figure 7b.

Rotor Angular Misalignment
Rotor angular misalignment occurs when the rotor angles of the power generator and gearbox misalign. Normally, two rotor angles are at the same level, as shown in Figure 8. Nevertheless, a large wind turbine is at high altitudes and easily influenced by an external force. The base probably misaligns due to lengthy external vibration, resulting indirectly in rotor angular misalignment. In this research, rotor angular misalignment occurs when the height of the gearbox is lowered, shown in Figure 9. This testing is to remove the thin plain washer to lower the height. After putting the thick and thin plain washers under the gearbox, its rotor angle reaches the same level as that of the power generator. The height of the thick plain washer is 2.96 mm and the thin plain washer is 2.00 mm. The output current of the power generator is shown in Figure 7c.
wind turbine is at high altitudes and easily influenced by an external force. The base probably misaligns due to lengthy external vibration, resulting indirectly in rotor angular misalignment. In this research, rotor angular misalignment occurs when the height of the gearbox is lowered, shown in Figure 9. This testing is to remove the thin plain washer to lower the height. After putting the thick and thin plain washers under the gearbox, its rotor angle reaches the same level as that of the power generator. The height of the thick plain washer is 2.96 mm and the thin plain washer is 2.00 mm. The output current of the power generator is shown in Figure 7c.

Bearing Fault
The ball bearing, shown in Figure 10a, involves an inner race, an outer race, steel balls, and a retainer. Normally, the steel ball rolls smoothly in the inner raceway and outer raceway. However, the external objects can damage the raceway or steel balls. In addition, the ball bearing can suffer abrasion by operating over a long period of time. This research suggests that the damage of the inner raceway and outer raceway is the malfunction situation. Since the bearing material is steel, electrical

Gear box Generator
Plain washer (thick and thin) 4.96mm

Gear box Geneartor
Plain washer (thin) 2.00mm and thin plain washers under the gearbox, its rotor angle reaches the same level as that of the power generator. The height of the thick plain washer is 2.96 mm and the thin plain washer is 2.00 mm. The output current of the power generator is shown in Figure 7c.

Bearing Fault
The ball bearing, shown in Figure 10a, involves an inner race, an outer race, steel balls, and a retainer. Normally, the steel ball rolls smoothly in the inner raceway and outer raceway. However, the external objects can damage the raceway or steel balls. In addition, the ball bearing can suffer abrasion by operating over a long period of time. This research suggests that the damage of the inner raceway and outer raceway is the malfunction situation. Since the bearing material is steel, electrical

Gear box Generator
Plain washer (thick and thin) 4.96mm

Bearing Fault
The ball bearing, shown in Figure 10a, involves an inner race, an outer race, steel balls, and a retainer. Normally, the steel ball rolls smoothly in the inner raceway and outer raceway. However, the external objects can damage the raceway or steel balls. In addition, the ball bearing can suffer abrasion by operating over a long period of time. This research suggests that the damage of the inner raceway and outer raceway is the malfunction situation. Since the bearing material is steel, electrical discharge machining (EDM) is employed to avoid the excessive abrasion. Figure 10b shows the outer raceway after damage. Figure 10c shows the damaged bearing and a sample of this testing. The apertures of the holes in the inner and outer raceway are 1 mm. The output current of the power generator is shown in Figure 7d.
Energies 2020, 13, x FOR PEER REVIEW 10 of 15 discharge machining (EDM) is employed to avoid the excessive abrasion. Figure 10b shows the outer raceway after damage. Figure 10c shows the damaged bearing and a sample of this testing. The apertures of the holes in the inner and outer raceway are 1 mm. The output current of the power generator is shown in Figure 7d.

Classified Dataset
According to the four operating conditions of the generator, data is measured and recorded, sequentially. Each operating condition can generate 200 samples; the first 160 samples are taken for

Classified Dataset
According to the four operating conditions of the generator, data is measured and recorded, sequentially. Each operating condition can generate 200 samples; the first 160 samples are taken for a training set and the latter 40 samples are taken as a testing set. The samples total 800, including the training set of 640 samples and testing data of 160 samples, shown in Table 1. Each data is a signal lasting for 0.5 s. According to Equation (5), E f does not have unit, and the energy spectrums of four operational conditions of each feature are shown in Figure 11.

Classified Dataset
According to the four operating conditions of the generator, data is measured and recorded, sequentially. Each operating condition can generate 200 samples; the first 160 samples are taken for a training set and the latter 40 samples are taken as a testing set. The samples total 800, including the training set of 640 samples and testing data of 160 samples, shown in Table 1. Each data is a signal lasting for 0.5 s. According to Equation (5), does not have unit, and the energy spectrums of four operational conditions of each feature are shown in Figure 11. Classification can be influenced by different levels of noise; therefore, various white Gaussian noises are involved in the testing. The signal-to-noise ratio (SNR) is used to measure the level of noise. SNRs of 40, 30, and 20 dB are included to compare the difference. The SNR is defined as follows: SNR = 10 log 10 ( ) dB (17) where is the power of original signal and is noise. Classification can be influenced by different levels of noise; therefore, various white Gaussian noises are involved in the testing. The signal-to-noise ratio (SNR) is used to measure the level of noise. SNRs of 40, 30, and 20 dB are included to compare the difference. The SNR is defined as follows: SNR = 10 log 10 P signal P noise dB (17) where P signal is the power of original signal and P noise is noise.

Simulation Results
This paper uses the PWKNN as a classifier of malfunction diagnosis and compares it with the back propagation neural network (BPNN), the k-NN, and the radial basis function network (RBFN). The iteration number of the PWKNN training is 400 times. The BPNN uses 40 neurons. The learning rate is 0.02; the iteration number is 400 times. The k parameter of k-NN is 3. The iteration number of RBFN training is 400 times; the spread is set as 0.1.
Without feature selection and noise disturbance, PWKNN, BPNN, and RBFN show correct classified accuracy. However, k-NN has 97.5% classified accuracy, as shown in Table 2  Correlation coefficients for the classification results of all features should be computed first in feature extraction. The features of smallest correlation coefficients are the first to be removed. Table 3 shows the correlation coefficients for the classification of all features; d 15 , d 14 and so forth are removed. PWKNN, k-NN and RBFN use the predictive classification accuracy as the criterion of feature selection; BPNN uses the mean square error as the criterion. The classification results and number after feature selection are shown in Table 4. Most classifiers enhance classification accuracy. When the SNR is 30 dB, RBFN raises the classification accuracy from 65% to 91%. The percentage of accuracy rises about 40%. The feature selection increases the training time but reduces the output time. In Table 4, while BPNN needs the longest training time, RBFN needs less training time than PWKNN, and k-NN needs the least training time. PWKNN, BPNN, k-NN, and RBFN enhance the output time by 61%, 6%, 42%, and 17%, respectively. The feature number is the average number of repetitive computations for five times. The average features of PWKNN, BPNN, k-NN, and RBFN are 2.8, 10.1, 2.1, and 6.5, respectively. PWKNN and k-NN have the highest efficiency of feature selection. The results show that the feature number has less influence on the output time because BPNN is limited to the network structure and neuron numbers. PWKNN has more efficient feature selection and can improve the output time. Although PWKNN does not have the least training and output time, the average classification accuracy is higher than other classifiers. BPNN has the equivalent classification accuracy but needs longer training time. Nevertheless, the output time is the highest. k-NN needs the least time; however, its classification accuracy is not higher than that of PWKNN. In addition, features influence RBFN significantly. Therefore, feature selection is an important factor. Thus, PWKNN is efficient for recognizing the operating conditions of wind generators. With the feature selection, PWKNN can reduce the feature numbers effectively and shorten the output time of classifiers.

Conclusions
This paper proposes the PWKNN to recognize the operational conditions of power generators. The PWKNN solves the identical probability of multi-label classification. The weight after adjustment can influence the features of classification through the parameter optimization of PSO and raise the classification accuracy effectively. Feature selection can eliminate the input time of classifiers and remove features effectively without lowering the classification accuracy. The research results show that whether there is noise or not, the average classification accuracy of the PWKNN is higher than that of other classifiers. If the SNR is 30 dB, the PWKNN has 93% classification accuracy. With the feature selection, the average feature numbers decrease from 16 to 2.8. The output time lowers 61%. If the SNR is 20 dB, the PWKNN increases classification accuracy from 72% to 82% through feature selection, which is better than other traditional classifiers. The PWKNN's ability to efficiently eliminate redundant features and strong classifications in noise conditions significantly improves the predictability of wind turbine failures as compared with other methods. In addition, the PWKNN is capable of diagnosing faults in wind power systems and can also consider applying in fault diagnosis of rotating machines.

Nomenclature a
The scale parameter a n Approximation coefficient b Translation parameter CWT(a, b) Continuous wavelet transform dB Decibel d 1 · · · d n Detail coefficients d1 . . . dn Euclidean distance between unknown point and class dist(X, Y) Euclidean distance computes of n-dimensions vectors D Decision result E f Energy spectrum E a j Energy of approximation coefficient E d j Energy of detail coefficients f (t) Original signal Fit(•) Fitness function gbest i The best position of all particles j Scale k Nearest neighbor l and z Time N

Number of all features N Correctly
Number of the correct classification N Total Number of total training samples P CV Predictive accuracy of PWKNN pca CV Cross-validation predictive classification accuracy pca CV Classification accuracy pca max,N CV Classification accuracy of PWKNN pca max,N−1 CV classification accuracy through weight optimization P signal Power of original signal P noise Noise pbest i The best position of particle i R Correlation coefficient, r 1 and r 2 Random number t j One of the neighbors in the training set x Input x i The current position of particle i x Mean of input, X and Y n-Dimensions vectors y Output y Mean of output y t j , c m Indicates whether t j belongs to class c m v i The velocity of particle i w Inertia weight factor W Weight W N Weight after the PWKNN optimization W N−1 Features after removal can gain the optimum weight ψ(t) Mother wavelet ϕ 1 Particle self-learning coefficient ϕ 2 Particle swarm learning coefficient