Establish Induction Motor Fault Diagnosis System Based on Feature Selection Approaches with MRA

: This paper proposes a feature selection (FS) approach, namely, correlation and fitness value-based feature selection (CFFS). CFFS is an improvement feature selection approach of correlation-based feature selection (CFS) for the common failure cases of the induction motor. CFFS establishes the induction motor fault detection (FD) system with artificial neural network (ANN). This study analyzes the current signal of the induction motor with multiresolution analysis (MRA), extracts the features, and uses feature selection approaches (ReliefF, CFS, and CFFS) to reduce the number of features and maintain the accuracy of the induction motor fault detection system. Finally, the induction motor fault detection system is trained by the feature selection approaches selected features. The best induction motor fault detection system will be established through the comparison of the efficiency of these FS approaches.

where, a is scaling function, and b is displacement function.

Multiresolution Analysis (MRA)
This study uses multiresolution analysis (MRA) for signal processing and selects the features of time domain and frequency domain of the signal that is reconstructed by the wavelet coefficient. Multiresolution analysis was developed by Mallat. Multiresolution analysis is shown as (4). MRA decomposes the original signal f(t) into detail coefficient dj and approximation coefficient aj with scaling function φ(t) and wavelet function ψ(t), shown as (5) and (6).
where, g0 and h0 are coefficients of filter. The schematic diagram of MRA shown in Figure 1.

Artificial Neural Network Training (ANN)
This study uses artificial neural network training to classify and recognize the four cases of induction motor failure with the features that were selected by feature selection approaches. The construct of the neural network was developed by McCulloch and Pitts, as shown in Figure 2. The network is composed of an input layer (j), hidden layers (k, l), and output layer (z) with neurons. The input layer inputs the data into network, the hidden layers calculate the input with weight, bias, and activation function, and you receive the result from the output layer. Backpropagation [26] is a combination of multi-layer feed-forward neural network [27] and error back propagation [28]. This artificial neural network was designed with the model as nervous system, using stochastic gradient descent to fix the coefficient of the neural network to promote the classification system. The feedforward propagation is shown in Figure 3a. Calculate the output with input layer, hidden layers, and output layer. Each neuro of the network has one weight wjk. All of the input data that multiplies with the weight will be summated, and one bias bk1 of the neuron will be added to get the output zk1 transfer with activation function f(x). The back propagation is shown in Figure 3b. Calculate the error between the output of feedforward propagation and the target output and select the learning rate to fix the weight and bias of the network with error stochastic gradient descent. We could get the new weight and bias for neural network. The construct of ANN of this study is shown as Figure 2. The steps are listed as follows: Step 1. Set the input data (zj), target output (t), numbers of feature (f ), numbers of neurons (c), weights (w jk ) between input layer and hidden layer (k), biases (b k ) of hidden layer (k), activation function (f k(x) ) of hidden layer (k), weights (w kl ) between input layer and hidden layer (l), biases (b 1 ) of hidden layer (l), activation function (f l(x) ) of hidden layer (l) , learning rate (ln), times of training (Times), times of iteration (T). This study sets c = f + 2, ln = 0.007, T = 50, all the weights and biases are random numbers between 0 and 1, activation ) is hyperbolic tangent sigmoid transfer function [29], activation function (f l(x) ) is soft max function [30].
where, hyperbolic tangent sigmoid transfer function, shown as (7), limits the output data to −1 and 1, as shown in Figure 4. Softmax function, shown as (8), calculates the input data and let summation of output data equal 1, shown as Figure 5, and the maximum term is regarded as the classification result.
x 1  Step 3. Calculate the error (E) between output (zl) and target output (t) by cross entropy, shown as (13).
Step 5. Use stochastic gradient descent algorithm (SGD) to fix weights (w jk ) and biases (b k ), the process is shown as (14)-(17). Step 6. If Times is not equal T, calculate Times as (18), and back to Step 2. 1

Times Times
= + Step 7. ANN completely established. The flowchart of ANN is shown in Figure 6.

Start
Calculates the output of feed-forward propagation.
Fixes weights w jk and biases b k .

Times = T
BPNN completely established.

Yes
Calculates the error between output and target output.
Fixes weights w kl and biases b l .
Set zj, t, f, c, w jk , b k , f k(x) , w kl , b l , f l(x) , ln, Times, and T. Times = Times + 1 Figure 6. Flowchart of the ANN.

Correlation Analysis Algorithm
This chapter introduces the correlation analysis algorithms: Relief and ReliefF algorithms were used in the research. Additionally, this chapter compares three feature selection approaches: ReliefF, CFS, and correlation and fitness value-based feature selection (CFFS). The features extracted from the current signal of induction motor are not all useful for the classifier, some of the features do not affect the recognition result, and some of the features reduce the recognition effect. This study uses feature selection approach to delete the useless features to increase the recognition of the classifier and reduce the cost of calculation.

Relief
This study uses Relief to calculate the correlation between features. Relief was proposed by K. Kira and L.A. Rendell. This feature selection approach was designed for application to binary classification problems. Relief calculates the correlation between feature (Fh) and target (Fm), selects one stochastic value (fh) from feature (Fh), chooses one nearest value (fnh, Near-hit) of feature (Fh) and one nearest value (fnm, Near-miss) of target (Fm) with it. The steps of Relief are listed as follows, and the flowchart of Relief is shown in Figure 7.
Step 1. Set the set of features (Fi), and maximum times of sampling (k). Initialize the correlation (Wi) to zero, and times of sampling (i) to one. Set of features means all of the features sampling Step 2. Choose two features (Fh, Fm) from set of features.
Step 3. Select one value fh from one of the features (Fh).
Step 4. Select one nearest value fnh with fh from h , and a nearest value where diff(fh, fnh) is the distance between fh and fnh, diff(fh, fnm) is the distance between fh and fnm.
Step 6. If i ≥ k, select i with Formula (20), and Wi with Formula (21). Go back to Step 3.
Step 7. Get the correlation (Rf) between feature Fh and feature Fm.

Start
Choose two features (F h , F m ) from set of features .
Set F i , and k. Initialize W i = 0, and i = 1.
Calculate the correlation (R f ).
i ≥ k Get the correlation R f .

Yes
Select a value (f h ) from one of the features (F h ).
Select f nh and f nm .

ReliefF
This study uses ReliefF to calculate the correlation between feature and classification. ReliefF was proposed by I. Kononenko. This feature selection approach is designed for multiclass classification. ReliefF calculates the correlation between feature (Fh) and classification (C), selects one stochastic value (fh) from feature, chooses one nearest value (fsh, Near-hit) of the same classification with fh, and one nearest value (fsm, Near-miss) of each different classification with fh. The steps of ReliefF are listed as follows, and the flowchart of RliefF is shown in Figure 8.
Step 1. Set the set of features (Fi) , maximum times of sampling (k) , classification of sampling (Ca,b) , and the maximum sample number of the nearest value (s) . Initialize the correlation (Wi) to zero, times of sampling (i) to one, and sample number of the nearest value j . The set of features means all of the feature sampling Classification of sampling means all of the feature sampling is classification to m class Step 2. Choose one feature (Fh) from set of features and select a value fh from Fh. fh is belong to n class of all classification (m).
Step 3. Select a nearest value (f nh ) with f h from n class of h , and the nearest value Step 4. Calculate the correlation (RfF) with formula is shown as (22), where diff(fh, fnh) is the distance between fh and fnh, diff(fh, fnmb) is the summation of distance between fh and fnmb.
Step 5. If j ≥ s, select j with Formula (23), and Wi with Formula (24). Go back to Step 3.
Step 6. If i ≥ k, select i with Formula (25), and Wi with Formula (24). Go back to Step 2.
Step 7. Get the correlation (RfF) between feature Fh and classification of sampling(Ca,b).

Start
Choose one feature (F h ) from set of features, and select a value (f h ) from F h .
Set F i , k, C a,b , and s. Initialize W i = 0, i = 1, and j = 1.
Calculate the correlation (R fF ).
Get the correlation R fF . End

Yes
Select f nh and f nmb .

CFS
This study uses CFS to select the feature. CFS was proposed by M. A. Hall. CFS calculates the merit value based on the correlation between features (Rf), and the correlation between feature and classification (RfF). CFS selects the useful features for increased accuracy and reduces the number of features and runtime. In M. A. Hall's paper, CFS was designed for binary classification problems, and used Relief to calculates Rf and RfF. As the study is discussing multiclass classification in this paper, we therefore, use Relief to calculate Rf, and ReliefF to calculate RfF. The steps of CFS are listed as follows, and the flowchart of CFS is shown in Figure 9.
Step 1. Set the target of feature T = { }, set of feature Fi = F1, F 2 , ⋯ ⋯ , F n , correlation between features(Rf), and correlation between feature and classification (RfF) where, Rfij is the correlation between feature Fi and feature Fj, RfFi is the correlation between feature Fi and classification.
Step 2. Calculate the merit value (FMi) based on Rf and RfF with Formula (26) of each feature. FMi where, k is number of features (at this step, k = 1), ⎯R fij is the average of R fij , ⎯R fFi is the average R fFi (at this step, ⎯R fFi =R fFi ).
Step 3. Choose the feature (Fj) that have maximum merit value, set it into T, T = Fj , and F i ⊄ F j , Step 4. Calculate the merit value between T and Fi with Formula (26).
Step 5. If F i ≠ ∅, go back to Step 3.
Step 6. Get the merit value of each number of features.

Start
Calculate the Merit value (F Mi ).
Get the Merit value of each number of features.

Yes
Choose the feature (F j ) that have maximum Merit value, and set it into T.
Calculate the Merit value between T and F i .

CFFS
CFFS is a feature selection approach, improved with CFS. Use this approach to select critical features from all of the features. This study uses CFFS to reduce the number of features of the classification system. Calculate the merit value based on the correlation between features (Rf), the correlation between feature and classification (RfF) , fitness value Wfi of each feature, and the merit_new value with Formula (27).

Merit _ new Merit
The fitness value was calculated by particle swarm optimization (PSO). PSO is the optimize algorithm that can optimize the coefficients to establish the best neural networks. This study use PSO to optimize the weights of features [31,32], and record the swarm's best-known solution of PSO. We chose the number of features and weights of features after training ANN. The steps of PSO are listed as follows, and the flowchart of PSO is shown in Figure 10a.
Particle's position in space is means weights of features X p = (X p1 , X p2 , ⋯⋯, X pj ), particle's velocity where j is number of features.
Step 2. Calculate the fitness value (Wfi ) of each feature. Fitness value is particle's accuracy of ANN.
Step 4. Make sure Gbestnew  Gbestold, replace Gbest_old with Gbest_new as (29), if not, go to Step 5. Gbestold is swarm's best-known solution before fixed, Gbest_new is swarm's best-known solution after fixed, swarm's best-known solution is the best solution of particle's best-known solution.
Step 7. Make sure k = k max , if not, calculates k as (34) and back to Step 2.
Step 8. Get the swarm's best-known solution for ANN.
Formula (27) calculates the merit_new value with the fitness value from PSO to select the features that are useful to the classification system and get the best accuracy. The steps of CFFS are listed as follows, and the flowchart of CFFS is shown in Figure 10b.
Step 1. Set the target of feature T = { }, set of feature F i = F 1 , F 2 , ⋯ ⋯ , F n , correlation between features (Rf), correlation between feature and classification (RfF), number of all features (n), set of feature F i = F 1 , F 2 , ⋯ ⋯ , F n , and number of features k =1.
Step 9. Make sure k = n, if not, back to Step 6.
Step 10. Get the merit_new value of each combination of number of features.

Start
Calculate the fitness value W fi .
Set p, k, k max , w, c 1 , c 2 , X p , V p , P bestp , and G best Fix the particle's position in space.
Get the swarm's best known solution.

Measurement and Analysis of the Current Signal of the Induction Motor
The specification of the induction motor is shown in Table 1   This study used power supply for the induction motor with three phases, 220 V, and alternating current (AC) power that current would reverse periodically. Control panel sets the AC servo motor to simulate the load to the induction motor. Finally, we used NI PXI-1033 to capture the current signal of induction motor at one of the three phases. Time sampling for 2 s and sampling frequency for 1000 Hz of each observation, so that each observation has 2000 data points. In addition, the observations database is also publicly posted in this link (https://reurl.cc/gmmWgN) in the form of a Matlab file. Equipment layout is shown in Figure 12. The steps of experiment are listed as the following: Step 1. Prepare all the samples of the motor. This study prepares four classes of motor: normal, damage of shaft output of bearing, layer short, and broken rotor bar.
Step 2. Connect the three phases R, S, and T of motor with AC power supply. R, S, and T are same AC power but have different angle. Step 3. Setup the NI PXI-1033 and computer and choose one of the three phases to measure current signal.
Step 4. Use AC power supply to input the power to the motor.
Step 5. Use control panel to select torque of AC servo motor to simulate the load of the motor. This study selected the half load for the motor.
Step 6. Use Labview to record data from NI PXI-1033.
Step 7. Use Matlab to analyze the current signal of each motor.
Current signal of induction motor processing with MRA and standardization. d1, d2, d3, d4, d5, and a5 were decomposed from current signal of normal motor as shown in Figure 13a (1) The damage of shaft output of bearing: d1 and d4 are smaller than that of the normal motor. d2, d3, d5, and a5 are similar to the normal motor. (2) The layer short: d1 is smaller than that of the normal motor. d2, d3, d4, d5, and a5 are similar to the normal motor. (3) The broken rotor bar: d1 is smaller than that of the normal motor. d2 is larger than that of the normal motor. d3, d4, d5, and a5 are similar to the normal motor.

Accuracy of Classifier
Feature selection approaches: ReliefF, CFS, and CFFS have different feature selection criterion that result in different consequences for the approaches. ReliefF calculates the correlation between feature and classes and selects features in correlation numerical order. CFS calculates the merit value with Relief and ReliefF, and selects features in merit numerical order. CFFS calculates the merit_new value with merit value and weights, selects features in merit_new numerical order. The critical features and the features combination are selected with the least amount of features, and the accuracy can be improved by using the proposed feature selection method.
(1) ReliefF: According to Table 3, the results of ReliefF indicate that the critical features at 38, 39, 40, and 45 number of features increase accuracy with a sharp rise: between 37 and 38 number of features, a 9.04% (from 26.81% to 35.85%) increase can be observed at F13; between 38 and 39 number of features, a 8.12% (from 35.85% to 43.97%) increase can be observed at F29; between 39 and 40 number of features, a 19.38% (from 43.97% to 63.35%) increase can be observed at F9; between 42 to 43 number of features, a 6.01% (from 65.44% to 71.45%) increase can be observed at F3.   F1, F2, F8, F10, F11, F14, F15, F17, …, F20, F25, F26, F29, F31, F32,  F37, …, F50, F52, F53, F55, F56, F59 (3) CFFS: According to Table 5, the results of CFFS indicate that the critical features at 1, 2, 3, and 5 number of features increase accuracy with a sharp rise: F6 has 39.5% accuracy at 1 number of features; between 1 and 2 number of features, a 16% (from 39.5% to 55.5%) increase can be observed at F34; between 2 and 3 number of features, a 28.75% (from 55.5% to 84.25%) increase can be observed at F57; between 4 and 5 number of features, a 5.5% (from 84.25% to 90%) increase can be observed at F35. The accuracy of ReliefF, CFS, and CFFS are shown in Figure 15. ReliefF accuracy curve is black wire (-), the number of features have best effect at the red star mark (★), the number of features have best accuracy at the red square mark (). CFS accuracy curve is blue wire (-), the number of features have best effect at the purple star mark (★), the number of features have best accuracy at the purple square mark (). CFFS accuracy curve is green wire (-), the number of features have best effect at the yellow star mark (★), the number of features have best accuracy at the yellow square mark (). The ReliefF accuracy curve becomes stable at a point of 52 number of features (accuracy: 79.27%) and reaches its maximum point at 60 number of features (accuracy: 83.03%). The CFS accuracy curve becomes stable at a point of 38 number of features (accuracy: 81.21%) and reaches its maximum point at 56 number of features (accuracy: 82.31%). The CFFS accuracy curve becomes stable at a point of five number of features (accuracy: 90%) and reaches its maximum point at 12 number of features (accuracy: 92.25%).

Conclusions
This study uses MRA to analyze current signal, extract the features, and uses feature selection approaches to select critical features for induction motor classification to establish the fault detection system with ANN. ReliefF, CFS, and CFFS are different feature selection approaches, and these approaches could reduce the number of features and the system operation cost while maintaining good accuracy. The results of the study show that ReliefF, CFS, and CFFS have better efficiency than the unused feature selection approach, and in compared to the unused feature selection approach (number of features: 60, accuracy: 83.03%) the accuracy of ReliefF (number of features: 52, accuracy: 79.27%) is lower 0.76%, reduces 13.3% number of features. The accuracy of CFS (number of features: 38, accuracy: 81.21%) is lower 1.82%, reduces 36.7% number of features. The accuracy of CFFS (number of features: 5, accuracy: 90%) is higher 6.97%, reduces 90% number of features. Therefore, the CFFS has the best efficiency of the three feature selection approaches, as it reduces most of the features, and also improves the accuracy of the induction motor fault detection system.