Induction Motor Fault Classiﬁcation Based on FCBF-PSO Feature Selection Method

: This study proposes a fast correlation-based ﬁlter with particle-swarm optimization method. In FCBF–PSO, the weights of the features selected by the fast correlation-based ﬁlter are optimized and combined with backpropagation neural network as a classiﬁer to identify the faults of induction motors. Three signiﬁcant parts were applied to support the FCBF–PSO. First, Hilbert–Huang transforms were used to analyze the current signals of motor normal, bearing damage, broken rotor bars and short circuits in stator windings. Second, ReliefF, symmetrical uncertainty and FCBF three feature-selection methods were applied to select the important features after the feature was captured. Moreover, the accuracy comparison was performed. Third, particle-swarm optimization (PSO) was combined to optimize the selected feature weights which were used to obtain the best solution. The results showed excellent performance of the FCBF–PSO for the induction motor fault classiﬁcation such as had fewer feature numbers and better identiﬁcation ability. In addition, the analyzed of the induction motor fault in this study was applied with the di ﬀ erent operating environments, namely, SNR = 40 dB, SNR = 30 dB and SNR = 20 dB. The FCBF–PSO proposed by this research could also get the higher accuracy than typical feature-selection methods of ReliefF,


Introduction
Nowadays, automated production has become a trend. The number of unmanned factories is increasing, which means that the stability requirements of machinery and equipment are also increasing. Therefore, failure analysis of the motor and how to determine the type of failure has become an important subject. In general, motor failures are divided into two categories: electrical and mechanical failures. Most mechanical damage occurs in stators, rotors and bearings. Among them, bearing failure is the most common as shown in Table 1 [1]. In this research, the current signal amount was measured and analyzed under different motor conditions. Current signal was selected for measurement because it was less affected by vibration and 1.
The sum of the number of local maximum and local minima must be the same as or different from the number of zero-crossings, which means that an extreme value must have a zero-crossing point behind it.

2.
At any time, the upper envelope defined by the local maximum and the lower envelope defined by the local minimum must be averaged to approach zero.

Empirical Mode Decomposition
EMD is the signal processing before HHT, which decomposes the signal into a combination of IMF. Due to the HHT's restrictions on the instantaneous frequency, if the general signal data are directly used, the complete and correct instantaneous frequency cannot be obtained. The steps of EMD screening IMF are listed as follows: Step 1. Input the original signal x(t) to find the local maximum and local minima. Connect their values to upper envelope H(t) and lower envelope L(t), respectively; Step 2. Calculate the average of the upper envelope H(t) and the lower envelope L(t) to get the mean envelope m(t); Step 3. Subtract the original signal x(t) from the mean line m(t) to get h(t); Step 4. Check whether h(t) meets the conditions of the IMF. If not, go back to Step 1 and replace x(t) with h(t). Rescreen until h(t) meets the conditions and termination of IMF, and store h(t) as the component C i of IMF; Step 5. Subtract the h(t) from original signal x(t) to get R(t); Step 6. Check whether R(t) is monotonic function or not. If yes, stop decomposition. If not, repeat Step 1 to Step 5.
Therefore, the original data can be decomposed into n IMFs and a trend function, and we can perform HHT on the IMF for signal analysis. The flowchart of the EMD is shown in Figure 1.
Therefore, the original data can be decomposed into n IMFs and a trend function, and we can perform HHT on the IMF for signal analysis. The flowchart of the EMD is shown in Figure 1.

Hilbert Transform (HT)
The Hilbert transform (HT) method changes the previous analysis of nonlinear and nonsteady state. For the combination of IMF, HT is used to obtain the instantaneous amplitude and instantaneous frequency between signals. Through

Hilbert Transform (HT)
The Hilbert transform (HT) method changes the previous analysis of nonlinear and nonsteady state. For the combination of IMF, HT is used to obtain the instantaneous amplitude and instantaneous frequency between signals.
Through the equation, the instantaneous amplitude a i (t) and instantaneous phase angle φ i (t) can be obtained, which can be converted into (1) and (2). By differentiating the time with the instantaneous phase φ i (t), the instantaneous frequency ω i (t) can be obtained, as shown in (3).
Then, through these calculations, the distribution including frequency, time, and energy can be obtained using the instantaneous amplitude a i (t) and the instantaneous frequency ω i (t). This result is called the Hilbert spectrum. According to the information processing method established by the biologic nervous system, the network structure is constructed by several neurons. Through the interconnection of many neurons, the information transmitted from the outside world is processed and memorized, so that the corresponding response to the resulting changes can be facilitated. This includes an input layer, n hidden layers and an output layer.

Back Propagation Neural Network (BPNN)
The neural network (NN) is an operational model, consisting of interconnected neurons. Each node represents a specific output function, called the activation function [16]. Each connection between two neurons represents a weight. In the initial stage, the weights and offsets of NN are fixed and there is no learning ability. In 1986, the BPNN feedforward neural network model was proposed by Rumelhart et al. [17]. The network refers to the hierarchy of neurons. It consists of input layers, hidden layers and output layers. As shown in Figure 2, this research used the neural network toolbox (NNTOOL) in Matlab to create and train cascaded artificial neural networks. In the experiment, a three-layer feedforward neural network is trained by using the scaled conjugate gradient (SCG) algorithm. The activation function at the hidden layer is a hyperbolic tangent sigmoid transfer function and output layers are log-sigmoid transfer functions in the network. The numbers of hidden neurons applied in the verification are 10. As long as its weight, net-input and transfer function have derivative functions, the network can be trained. Moreover, the reason to choose SCG is that it is based on supervised learning and is comparatively faster than the standard backpropagation model [18]. <mtext>(t)</mtext></mrow> <annotation encoding='MathType-MTEF'>MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbWexLMBbXgBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVCI8FfYJH8 YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=J Hqpepeea0=as0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaaca qabeaadaqaaqaafaGcbaaeaaaaaaaaa8qacaqGjpWdamaaBaaaleaa peGaaeyAaaWdaeqaaOWdbiaabIcacaqG0bGaaeykaaaa@4434@ </annotation> </semantics> </math> <!--MathType@End@5@5@ --> . This result is called the Hilbert spectrum.

Neural Network (NN)
2.3.1. Architecture of NN Neural network (NN) was proposed by Mcculloch and Pitts in 1943. According to the information processing method established by the biologic nervous system, the network structure is constructed by several neurons. Through the interconnection of many neurons, the information transmitted from the outside world is processed and memorized, so that the corresponding response to the resulting changes can be facilitated. This includes an input layer, n hidden layers and an output layer.

Back Propagation Neural Network (BPNN)
The neural network (NN) is an operational model, consisting of interconnected neurons. Each node represents a specific output function, called the activation function [16]. Each connection between two neurons represents a weight. In the initial stage, the weights and offsets of NN are fixed and there is no learning ability. In 1986, the BPNN feedforward neural network model was proposed by Rumelhart et al. [17]. The network refers to the hierarchy of neurons. It consists of input layers, hidden layers and output layers. As shown in Figure 2, this research used the neural network toolbox (NNTOOL) in Matlab to create and train cascaded artificial neural networks. In the experiment, a three-layer feedforward neural network is trained by using the scaled conjugate gradient (SCG) algorithm. The activation function at the hidden layer is a hyperbolic tangent sigmoid transfer function and output layers are log-sigmoid transfer functions in the network. The numbers of hidden neurons applied in the verification are 10. As long as its weight, net-input and transfer function have derivative functions, the network can be trained. Moreover, the reason to choose SCG is that it is based on supervised learning and is comparatively faster than the standard backpropagation model [18].

Feature-Selection Method and Application
Input layer

Feature-Selection Method and Application
To achieve the best performance of the algorithm, the choice of features can be regarded as an extremely important method. Including in feature selection are feature extraction and feature construction. Among them, Yu and Liu [9] classified the feature subsets into four categories: (a) Appl. Sci. 2020, 10, 5383 5 of 21 completely irrelevant and noisy features, (b) weakly relevant and redundant features, (c) weakly relevant and nonredundant features and (d) strongly relevant features. An optimal subset mostly contains all the features in the category (c) and (d) as shown in Figure 3. In addition, feature selection involves two main objectives, which are to maximize the classification accuracy and minimize the number of features [19]. Strongly relevant features are indispensable for the enhancement of discriminative power and prediction accuracy. Sometimes, weakly relevant features can be useful for improving prediction accuracy if a feature is nonredundant and compatible with evaluation measures [20]. Moreover, the curse of dimensionality of data poses a severe challenge to many existing feature-selection methods with respect to efficiency and effectiveness [21,22]. The results of this research also compared various feature selection methods to highlight the efficiency and effectiveness of FCBF [23,24]. In summary, the main purpose of FCBF-PSO proposed in this study is to quickly screen out important features and improve accuracy.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 18 of 81 To achieve the best performance of the algorithm, the choice of features can be regarded as an extremely important method. Including in feature selection are feature extraction and feature construction. Among them, Yu and Liu [9] classified the feature subsets into four categories: (a) completely irrelevant and noisy features, (b) weakly relevant and redundant features, (c) weakly relevant and nonredundant features and (d) strongly relevant features. An optimal subset mostly contains all the features in the category (c) and (d) as shown in Figure 3. In addition, feature selection involves two main objectives, which are to maximize the classification accuracy and minimize the number of features [19]. Strongly relevant features are indispensable for the enhancement of discriminative power and prediction accuracy. Sometimes, weakly relevant features can be useful for improving prediction accuracy if a feature is nonredundant and compatible with evaluation measures [20]. Moreover, the curse of dimensionality of data poses a severe challenge to many existing feature-selection methods with respect to efficiency and effectiveness [21,22]. The results of this research also compared various feature selection methods to highlight the efficiency and effectiveness of FCBF [23,24]. In summary, the main purpose of FCBF-PSO proposed in this study is to quickly screen out important features and improve accuracy.

ReliefF
ReliefF is a feature-weighting algorithm proposed by Kira in 1992. Because it is limited to the classification of two kinds of data, it has not been applied too much. Hence, Kononeill expanded it in 1994. ReliefF feature selection is proposed to deal with more complex multi-category situations [25]. This method is simple and relatively efficient in execution. This feature-selection method uses correlations between the calculation characteristics and each fault category by randomly selecting any sample <!--MathType@Translator@5@5@MathML2 (no namespace).tdl@MathML 2.0 (no namespace)@ -

ReliefF
ReliefF is a feature-weighting algorithm proposed by Kira in 1992. Because it is limited to the classification of two kinds of data, it has not been applied too much. Hence, Kononeill expanded it in 1994. ReliefF feature selection is proposed to deal with more complex multi-category situations [25]. This method is simple and relatively efficient in execution. This feature-selection method uses correlations between the calculation characteristics and each fault category by randomly selecting any sample Y m from its training sample set and through the sample set of the same category as Y m , taking K of the same category as Y m . The neighboring sample H can be called near-hits. Similarly, find k-nearest neighbor samples M from the sample set and which are different from the Y m category, which can be called near-misses. This study sorts the weights of the features from high into low to facilitate the research. However, a limitation of this algorithm is that it cannot effectively remove redundant features. the steps of ReliefF are listed as follows: Z, the number of neighbor samples K, the threshold r of feature weights and the initial weight of each feature are zero; Step 2. Randomly select any sample Y m from all the sample types of Y; Step 3. Extract K adjacent samples of the same category as the sample Y m ; Step 4. K Near-misses are also found from the sample set sums different from the Y m category; Step 5. Calculate the weight of each feature in (4); Step 6. Repeat sampling to determine whether the number of sampling times Z was reached. If not, return to Step 2 and repeat until the maximum number of sampling times is reached; Step 7. Sorting features according to feature weights from large to small is mean the importance of the selected features; Step 8. Calculate the sum of the feature weights of the first n items (expressed as W_all) Step 9. If W_all < r then n + 1 and repeat Step 8 until W_all > r stops, where r is 90% of the sum of all feature weights; Step 10. Output the feature.
Among them, N is the number of features, K is the nearest neighbor sample taken by samples of different categories, Y is the sample category, Z is the number of samplings and k-nearest neighbors H j ( j = 1, 2, . . . K) in the same sample set of Y m can be found, y 1 and y 2 in di f f (N, y 1 , y 2 ) can expressed as the difference from feature N.

Information Entropy
Information entropy is the average amount of information contained in a message, which was proposed by Shannon in 1948 [26]. Among them, entropy is generally understood as a measure of uncertainty rather than certainty. Because the more random the message, the greater the entropy value, which also explains the probability distribution of the sample in information theory. The reason for using the logarithm of probability distribution as the information measure is that it is additive. The formula for entropy is shown in (5). Moreover, the information gain is shown in (6).
Among them, P is the probability mass function of x, k is a proportional constant corresponding to the selected metric, and i is the information body of x.

Symmetrical Uncertainty Method
For variables, the correlation and degree of influence between them is usually the most direct and fastest method of judgment. By calculating the correlation coefficient, it is usually possible to quickly obtain the correlation between the two, but if the correlation is used to select features, it usually results in a tendency to select features with larger values. Therefore, this research uses the method of SU to calculate the correlation between features and targets [27]. Calculation of SU is shown in (7).
It can also be explained from the definition that it is a form of information gain normalization. Nonlinear related information variables defined based on information entropy, used to reconstruct the degree of correlation between nonlinear random variables. Among them, the symmetric uncertainty is calculated using the SU value, which effectively corrects the bias about the selected feature. After normalizing the information gain, the SU value is between 0 and 1. Make the two different types relatively fair when comparing and explain that when SU(X, Y) = 1, it means that X and Y are completely related; otherwise, when SU(X, Y) = 0, it can be judged that X and Y are completely independent individuals.

Fast Correlation-Based Filter (FCBF)
This research uses the FCBF feature-selection method, which was proposed in 2004. It uses the SU value to replace the information gain and performs the calculation of selection [28]. This method can be divided into two parts. First use the features to sort the SU value of the fault type and use the threshold setting to delete the less influential features. Then compare the features, as shown in Figure 4. Because the correlation between features T 1 and T 2 , T 4 is higher than the relationship between T 2 , T 4 and category, consider T 2 , T 4 as redundant features with less correlation and delete them. The advantage of this method is that you can compare the correlation of features and perform feature selection at the same time and use features with higher correlation to filter other features that have not been deleted. In this way, the efficiency of calculation while filtering is achieved and the calculation is accelerated, and the recognition rate is improved. <math> <semantics> <mrow> <msub> <mtext>T</mtext> <mtext>4</mtext> </msub> </mrow> <annotation encoding='MathType-MTEF'>MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbWexLMBbXgBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVCI8FfYJH8 YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=J Hqpepeea0=as0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaaca qabeaadaqaaqaafaGcbaaeaaaaaaaaa8qacaqGubWdamaaBaaaleaa peGaaeinaaWdaeqaaaaa@411F@ </annotation> </semantics> </math> <!--MathType@End@5@5@ --> as redundant features with less correlation and delete them. The advantage of this method is that you can compare the correlation of features and perform feature selection at the same time and use features with higher correlation to filter other features that have not been deleted. In this way, the efficiency of calculation while filtering is achieved and the calculation is accelerated, and the recognition rate is improved. The steps of the process of FCBF feature selection are as follows: Step 1. Set the data set <!--MathType@Translator@5@5@MathML2 (no namespace).tdl@MathML 2.0 (no namespace)@ -->

Step 2. <math>
Step 3. <semantics> Step 4. <mrow> Step 5. <mrow><mo>(</mo> The steps of the process of FCBF feature selection are as follows: Step 2. Calculate the SU value between each feature t i and category Y; Step 3. Store the SU(t i , Y) values of each feature and category in descending order into the S set; Step 4. Calculate the sum of SU in the first n items in the S set (expressed as SU_all); Step 5. If SU_all < r, then n+1 repeat to Step 4 until S_all > r stops, where r is 90% the sum of the S set; Step 6. Remove features after nth from S (Remove features with less influence); Step 7. Select the feature T 1 with the largest SU(t i , Y) value from S as the main feature for selecting; Step 8. Calculate the SU(t i , T 1 ) of other features and main features in order and the SU(t i , Y) values between this feature and category Y; , it is regarded as a redundant feature and deleted from S; Step 10. The main feature is stored in S' and deleted from S; Step 11. Repeat Step 7 to Step 10 until S is the empty set and stop; Step 12. The output S' is expressed as an important feature.
Among them, the process of FCBF feature screening is divided into two stages. First, the SU value is used to distinguish the features, and second, by comparing the correlation between the features and the features, to distinguish whether the features are redundant. The flowchart of this method is shown in Figure 5.
Step 411. The main feature is stored in S' and deleted from S; Step 412. Repeat Step 7 to Step 10 until S is the empty set and stop; Step 413. The output S' is expressed as an important feature.
Among them, the process of FCBF feature screening is divided into two stages. First, the SU value is used to distinguish the features, and second, by comparing the correlation between the features and the features, to distinguish whether the features are redundant. The flowchart of this method is shown in Figure 5.

Application of FCBF-PSO
PSO is a type of macro-heuristic algorithm proposed by James Kennedy and Russell Eberhart in 1995. This algorithm was developed from bird watching and foraging behaviors. The principle is that multiple randomly distributed particles in space represent individuals in the bird population, and the position of each particle is assumed to be the best potential solution in the optimization problem. Since each particle is a feasible solution, the particle will have a fitness value after the operation.

Application of FCBF-PSO
PSO is a type of macro-heuristic algorithm proposed by James Kennedy and Russell Eberhart in 1995. This algorithm was developed from bird watching and foraging behaviors. The principle is that multiple randomly distributed particles in space represent individuals in the bird population, and the position of each particle is assumed to be the best potential solution in the optimization problem. Since each particle is a feasible solution, the particle will have a fitness value after the operation. Based on the particle's own experience and the group's experience, the flight speed and direction are updated again and iteratively repeated to make all particles converge to the best solution. As shown in (8) and (9) [29]. Among them, this study combines the feature-selection method of FCBF to optimize the weight of the selected features. The acceleration factors C 1 and C 2 are linearly reduced from 2.5 to 0.5. The weight ω is arbitrarily selected from 0.5 to 1 as in (10). Steps of FCBF-PSO is listed as follows: Step 1. Initially, in the d-dimensional space, parameters including particle number, number of iteration T, acceleration factors C 1 , C 2 and its own weight ω are set to form a particle population; Step 2. In space, assume that each feature is the coordinate of each particle X i = (X i1 , X i2 , . . . . . . , X ij ) and the flying speed of each particle is V i = (V i1 , V i2 , . . . . . . , V ij ); Step 3. Use FCBF feature-selection method to output important features F j ; Step 4. Bring the coordinates of the particles into the features F i = (X i1 ×F 1 , X i2 ×F 2 , . . . . . . , X ij ×F j ) to obtain the individual best solution P best and the group best solution G best ; Step 5. Use P best and G best to modify the particle's flight speed V i_new as shown in (10); Step 6. Correct the position X i_new of the particle with the updated flight speed V i_new to find a new position and speed; Step 7. If it meets the set number of iterations T Max , it will stop, otherwise repeat Step 3 to Step 5, usually the termination condition is to reach the best solution or to reach the number of iterations set by yourself; Step 8. All particles converge to obtain the best solution; Step 9. Finally, after the optimization process, a set of solutions with the optimal particle coordinates X best can be obtained, which is the optimized feature weight.
Among them, the method of optimizing feature weights of FCBF-PSO retains the important features of FCBF feature-selection method and optimizes the weights. The flowchart of the method proposed by this institute is shown in Figure 6.

Experiment Apparatus
The equipment used in this experiment was a three-phase squirrel-cage induction motor. Its specifications are shown in Table 2, and the types of induction motor failures used in this study are shown in Figure 7a-f. The power test platform is shown in Figure 7g, NI PXI-1033 signal acquisition device, digital electric meters and personal computers. The power test platform also includes servo motors, torque sensors and control panels. By using the above equipment, the measurement and analysis of the motor current signal could be completed.
V =W V +c r P -X +c r G -X × Figure 6. Flowchart of FCBF-PSO method.

Experiment Apparatus
The equipment used in this experiment was a three-phase squirrel-cage induction motor. Its specifications are shown in Table 2, and the types of induction motor failures used in this study are shown in Figure 7a-f. The power test platform is shown in Figure 7g, NI PXI-1033 signal acquisition device, digital electric meters and personal computers. The power test platform also includes servo motors, torque sensors and control panels. By using the above equipment, the measurement and analysis of the motor current signal could be completed.

Experiment Process
First, the flow of this study uses the servo motor of the power platform to be the load and is driven by the AC motor to made the motor run. Secondly, the motor current signal of any phase was captured for four different faulted through the signal picker, in which the sampling time of each data were 100 s, the acquisition frequency was 1000 Hz and 100 signals were measured for each signal. Finally, used the compiler analysis program MATLAB to HHT the measured signal on the computer and combine various feature screening methods. Using BPNN to compare its accuracy, the flow chart of its experimental architecture is shown in Figure 8a,b. We also repeated the calculation 200 times to obtain the average accuracy and performed the feature-selection method multiple times to ensure this study was repeatable and stable.

Original Signal
The current signals of induction motors were analyzed using EMD. Signal measurement and result for healthy motor shown as Figure 9a,b. The waveform, vibration and frequency of each IMF were different.

Experiment Process
First, the flow of this study uses the servo motor of the power platform to be the load and is driven by the AC motor to made the motor run. Secondly, the motor current signal of any phase was captured for four different faulted through the signal picker, in which the sampling time of each data were 100 s, the acquisition frequency was 1000 Hz and 100 signals were measured for each signal. Finally, used the compiler analysis program MATLAB to HHT the measured signal on the computer and combine various feature screening methods. Using BPNN to compare its accuracy, the flow chart of its experimental architecture is shown in Figure 8a,b. We also repeated the calculation 200 times to obtain the average accuracy and performed the feature-selection method multiple times to ensure this study was repeatable and stable.

Experiment Process
First, the flow of this study uses the servo motor of the power platform to be the load and is driven by the AC motor to made the motor run. Secondly, the motor current signal of any phase was captured for four different faulted through the signal picker, in which the sampling time of each data were 100 s, the acquisition frequency was 1000 Hz and 100 signals were measured for each signal. Finally, used the compiler analysis program MATLAB to HHT the measured signal on the computer and combine various feature screening methods. Using BPNN to compare its accuracy, the flow chart of its experimental architecture is shown in Figure 8a,b. We also repeated the calculation 200 times to obtain the average accuracy and performed the feature-selection method multiple times to ensure this study was repeatable and stable.

Original Signal
The current signals of induction motors were analyzed using EMD. Signal measurement and

Original Signal
The current signals of induction motors were analyzed using EMD. Signal measurement and result for healthy motor shown as Figure 9a,b. The waveform, vibration and frequency of each IMF were different.

Original Signal
The current signals of induction motors were analyzed using EMD. Signal measurement and result for healthy motor shown as Figure 9a,b. The waveform, vibration and frequency of each IMF were different.

HHT Feature Extraction
In this study, the current signal of the motor was first acquired and the EMD was applied to extract the IMF of 1 to 8 layers. Then, through HHT analysis, the instantaneous amplitude and instantaneous frequency of each layer were obtained. Then, we extracted the maximum, minimum, average, root mean square and standard deviation of the instantaneous amplitude and instantaneous frequency of each layer as the feature basis (10 features for each layer). Next, we normalized it so that the eigenvalues of the 4 motor types were distributed between 0 and 1 for easy comparison. Finally, F1, F2, F3... F79, F80 could be obtained, a total of 80 features, the method was shown in Figure 10.
According to the above extraction method, using HHT to extract a schematic diagram of features, we could get features F1, F2, F3... F79, F80. According to the features of the normal motor and three different fault conditions and retrieve 100 data as the basis for discrimination. Then, we used Matlab software to draw feature maps, where the vertical axis is the number of features and the horizontal axis is the number of samples. The research also simulated the actual operation of the induction motor, added the white noise with SNR = 40 dB, SNR = 30 dB and SNR = 20 dB. It could be found from the feature diagram that the current signal of the motor was analyzed by HHT to extract its features. The results show that compared to other motor failure the difference in bearing damage between features F40 to F47 was apparently. This suggests that they were the important features and easy to identify the fault. As the noise increased to 20 dB, the feature distribution of normal, broken rotor bar and short circuit in stator windings becomes more similar, which also increased the difficulty of identification, as shown in Figure 11a,b. used Matlab software to draw feature maps, where the vertical axis is the number of features and the horizontal axis is the number of samples. The research also simulated the actual operation of the induction motor, added the white noise with SNR=40 dB, SNR=30 dB and SNR=20 dB. It could be found from the feature diagram that the current signal of the motor was analyzed by HHT to extract its features. The results show that compared to other motor failure the difference in bearing damage between features F40 to F47 was apparently. This suggests that they were the important features and easy to identify the fault. As the noise increased to 20 dB, the feature distribution of normal, broken rotor bar and short circuit in stator windings becomes more similar, which also increased the difficulty of identification, as shown in Figure 11a,b.

Results of Induction Motor Fault Classification
First, the features of the measurement signal were extracted by HHT. The number of features was the most, but there may be more features that cannot clearly distinguish the type of failure, so that the accuracy could only reach 88.26%. Second, through ReliefF feature-selection method, features with lower feature weights could be deleted, removing 72% of the total number of features. If FCBF were used, it could be divided into two stages for screening features. In the initial stage, comparing the SU values of features and fault types, most of the features with lower impact could be deleted, removing 76% of the total number of features. Later, through the feature-to-feature correlation screening method, FCBF could effectively delete redundant features. A total of 87.5% of the total features can be deleted, as shown in Table 3. In order to prove that FCBF also had the advantage of running time, this study also calculated the time (Run 200 times to get the methods average running time). The result showed that FCBF had the ability to achieve a fast identification than ReliefF and SU, as shown in Table 4. In summary, the feature-selection method FCBF could delete the most features after the selection, which was the best for the three kinds of feature-selection method.  In this study, BPNN was used to classify each fault condition of the motor and white noise of 40 dB, 30 dB and 20 dB was added for research. It can be observed from Table 5 that in the absence of noise, the accuracy of HHT was 88.26% and the accuracy using HHT combined with ReliefF, SU value and FCBF were 90.05%, 90.02% and 90.75%. Among the three methods, FCBF could obtain better accuracy. Using HHT combined with FCBF-PSO particle group to optimize the weight of the feature, the accuracy could be increased from 88.26% to 92.85%. Therefore, it could be shown that the FCBF-PSO method proposed in this research could delete fewer important features and give the corresponding weights to the selected features after optimization. As seen in Table 6, a slight white noise SNR = 40 dB is added to the signal. The accuracy of HHT is 87.25% and after effectively removing unnecessary features through three feature-selection methods, the recognition results obtained are 88.35%, 85.87% and 88.86%. Explain that after screening, these three methods can also maintain the accuracy. Finally, using the method of FCBF-PSO proposed in this research to identify the fault condition, the results show that the accuracy can be increased from 87.25% to 91.76%. Second, as shown in Table 7, when the white noise increases to SNR = 30 dB, the method of combining HHT with SU value can maintain the accuracy of 81.08% when the number of features decreases. The feature-selection methods of ReliefF and FCBF can delete features and the accuracy is improved to 82.06% and 81.62%. With the FCBF-PSO method in this study, accuracy can reach 83.76%, which is the best of all methods. Finally-as shown in Table 8-in the case of severe noise with SNR = 20 dB, the classification accuracy of each feature-selection method is significantly reduced. The accuracy of ReliefF is 71.42%, the SU value is 70.35% and FCBF is 69.84%. The method of combining HHT with FCBF-PSO proposed in this study can improve the average accuracy to 72.68% when most features are deleted.

ReliefF Screening and Accuracy
This study uses ReliefF's feature screening mechanism to select features after HHT analysis. It is found that applying this feature-selection method can compare neighbor samples with each other and give corresponding weight to features with different correlations. Through the method to reduce the number of features, you can delete unimportant features while comparing in a short time and update the weights in real time. Moreover, through the preset threshold and sampling times have reached a balance, from which to obtain better recognition results. The number of features of ReliefF is reduced from 80 features originally analyzed using HHT to 22 features (72.5% of the total number of features deleted), and its accuracy is calculated using BPNN. The results show that when the number of features reaches 5, the accuracy can be achieved close to that of the undelete features. After recognition, the accuracy can also be increased from 88.69% to 90.05%, as shown in Figure 12. This shows that the research uses ReliefF method to quickly obtain highly related features and delete the features that affect the classification to maintain more effective recognition ability. However, this method still has the disadvantage of being unable to delete redundant features. Hence, that the complexity of the system operation is relatively increased.

SU Value Screening and Accuracy
Second, this study uses the feature-selection method of SU value to bring in the correlation coefficient for calculation. By obtaining the correlation between features and categories, the features will gradually obtain recognition results from high to low correlation. It is found that when the number of features reaches 7, the accuracy of this method tends to be stable. Moreover, it can be reduced to 19 features (76.25% of the total number of features can be deleted). After all features are identified, the recognition result can be increased from 88.83% to 90.2%, as shown in Figure 13. However, this method is the same as ReliefF, and it also cannot remove redundant features, so there are still a few features with more influence to affect the accuracy of classification.

FCBF Screening and Accuracy
Finally, this study uses the feature selection mechanism of FCBF to select the features after analysis. The results indicate that this feature-selection method cannot only effectively delete features with less influence, but also compare features with extremely high correlation. In this way, a twostage screening process is performed to reduce the number of features. While comparing it can delete redundant features in a short period of time, quickly achieve balance and obtain better recognition results. For the FCBF method, the selected features are gradually identified from important to

SU Value Screening and Accuracy
Second, this study uses the feature-selection method of SU value to bring in the correlation coefficient for calculation. By obtaining the correlation between features and categories, the features will gradually obtain recognition results from high to low correlation. It is found that when the number of features reaches 7, the accuracy of this method tends to be stable. Moreover, it can be reduced to 19 features (76.25% of the total number of features can be deleted). After all features are identified, the recognition result can be increased from 88.83% to 90.2%, as shown in Figure 13. However, this method is the same as ReliefF, and it also cannot remove redundant features, so there are still a few features with more influence to affect the accuracy of classification.

SU Value Screening and Accuracy
Second, this study uses the feature-selection method of SU value to bring in the correlation coefficient for calculation. By obtaining the correlation between features and categories, the features will gradually obtain recognition results from high to low correlation. It is found that when the number of features reaches 7, the accuracy of this method tends to be stable. Moreover, it can be reduced to 19 features (76.25% of the total number of features can be deleted). After all features are identified, the recognition result can be increased from 88.83% to 90.2%, as shown in Figure 13. However, this method is the same as ReliefF, and it also cannot remove redundant features, so there are still a few features with more influence to affect the accuracy of classification.

FCBF Screening and Accuracy
Finally, this study uses the feature selection mechanism of FCBF to select the features after analysis. The results indicate that this feature-selection method cannot only effectively delete features with less influence, but also compare features with extremely high correlation. In this way, a twostage screening process is performed to reduce the number of features. While comparing it can delete redundant features in a short period of time, quickly achieve balance and obtain better recognition results. For the FCBF method, the selected features are gradually identified from important to relatively unimportant features. It was found that the number of features was reduced from 80

FCBF Screening and Accuracy
Finally, this study uses the feature selection mechanism of FCBF to select the features after analysis.
The results indicate that this feature-selection method cannot only effectively delete features with less influence, but also compare features with extremely high correlation. In this way, a two-stage screening process is performed to reduce the number of features. While comparing it can delete redundant features in a short period of time, quickly achieve balance and obtain better recognition results. For the FCBF method, the selected features are gradually identified from important to relatively unimportant features. It was found that the number of features was reduced from 80 features originally analyzed using HHT to 10 features (87.5% of the total number of features was deleted). When using the classifier to calculate its accuracy, when the number of features reaches 4, the accuracy of this method tends to be stable. Moreover, when gradually completed, the accuracy of all features can be effectively increased from 88.72% to 90.75%, as shown in Figure 14. This shows that this screening method can obtain important features compared to the first two and delete unnecessary features that affect the classification to maintain effective identification.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 70 of 81 all features can be effectively increased from 88.72% to 90.75%, as shown in Figure 14. This shows that this screening method can obtain important features compared to the first two and delete unnecessary features that affect the classification to maintain effective identification.

Influence of Various Characteristics on Accuracy
This section mainly discusses the important features by each feature-selection method, which may be different due to the basis of each method screening, so that the number of important features and feature numbers also have different results. Therefore, this study ranks the importance of the features selected by the three different methods. It can be found that the importance and number of features are also not the same due to the different amount of noise, as shown in Table 9. By comparing the features in Table 9 with Figures 13 and 14, it can be found that the algorithm of SU value have redundant features compared to FCBF. Hence, we marked the redundant features are bold.
The FCBF-PSO method proposed by this research, PSO features weights for the important features after screening. This method retains all FCBF screening features and uses PSO to optimize the weights of the screening features. Compared with the pre-screening, the accuracy is increased by 4.59%, and the total number of features of 87.5% can be deleted. Simulate the many influencing factors of the actual operating environment of the motor, this study also added white noise of 40 dB, 30 dB and 20 dB to the initial measurement signal for research. Experimental results show that the method has obvious feature deletion under 40-dB and 30-dB noise conditions. Eleven and 13 important features can be obtained, respectively, and the accuracy can reach 91.76% and 83.76%. When the noise is 20 dB, the number of features is reduced to 15 after screening, and the recognition result is significantly reduced, but the accuracy of 72.68% can still be maintained. The comparison between the classifier and the number of features shows that the FCBF-PSO method proposed in this study is the best, as shown in Table 10.

Influence of Various Characteristics on Accuracy
This section mainly discusses the important features by each feature-selection method, which may be different due to the basis of each method screening, so that the number of important features and feature numbers also have different results. Therefore, this study ranks the importance of the features selected by the three different methods. It can be found that the importance and number of features are also not the same due to the different amount of noise, as shown in Table 9. By comparing the features in Table 9 with Figures 13 and 14, it can be found that the algorithm of SU value have redundant features compared to FCBF. Hence, we marked the redundant features are bold.
The FCBF-PSO method proposed by this research, PSO features weights for the important features after screening. This method retains all FCBF screening features and uses PSO to optimize the weights of the screening features. Compared with the pre-screening, the accuracy is increased by 4.59%, and the total number of features of 87.5% can be deleted. Simulate the many influencing factors of the actual operating environment of the motor, this study also added white noise of 40 dB, 30 dB and 20 dB to the initial measurement signal for research. Experimental results show that the method has obvious feature deletion under 40-dB and 30-dB noise conditions. Eleven and 13 important features can be obtained, respectively, and the accuracy can reach 91.76% and 83.76%. When the noise is 20 dB, the number of features is reduced to 15 after screening, and the recognition result is significantly reduced, but the accuracy of 72.68% can still be maintained. The comparison between the classifier and the number of features shows that the FCBF-PSO method proposed in this study is the best, as shown in Table 10.

Conclusions
This study classifies the current signals of normal, bearing damage, broken rotor bar and short circuits in stator windings of AC induction motors. Among them, we used the signal extractor to capture the current signal of the motor and use HHT to extract its maximum, minimum, average, root mean square and standard deviation features. In addition, we propose feature-selection methods for comparison, including ReliefF that gives feature weights, calculation of SU value indicating the degree of feature influence and FCBF to delete redundant features. Finally, we combined the PSO to optimize the weight of important features and tested the system's noise immunity by adding white noise. We repeated results calculations 200 times to obtain an average accuracy and performed the feature-selection method multiple times. The results showed stable numbers in both features and accuracy. Moreover, using FCBF-PSO to optimize feature weights to identify fault types had better results. The following are the main research results of this study: 1.
In this study, through the comparison of feature-selection methods, ReliefF, SU value and FCBF were used to improve the invalid or poor features on the classifier. Under normal circumstances, the number of features decreased by 72.5%, 76.25% and 87.5%, respectively. In terms of accuracy of classification, only the SU method values decreased slightly by 1.16%. The other two feature screening methods were optimized. When adding severe noise (SNR = 20 dB), the number of features of the three screening feature methods improved. As far as accuracy was concerned, in addition to the obvious improvement of FCBF, the other two methods still could achieve more than 70% classification ability.

2.
This study also used a PSO optimization model to optimize the feature weights of the three-phase induction motor signals obtained by the FCBF feature-selection method. This method also preserved all the features of FCBF. Under normal circumstances, its classification accuracy through BPNN could reach 92.85%, which was superior to other feature-selection methods. It also improved the accuracy of 4.59% higher than that of HHT. Then doping it with different noise SNR = 40 dB, SNR = 30 dB and SNR = 20 dB white noise, its accuracy also increased by 4.51%, 3.12% and 3.02%. Therefore, it was shown that this method could obtain a higher classification accuracy.