Broken Rotor Bar Fault Detection and Classification Using Wavelet Packet Signature Analysis Based on Fourier Transform and Multi-Layer Perceptron Neural Network

As a result of increasing machines capabilities in modern manufacturing, machines run continuously for hours. Therefore, early fault detection is required to reduce the maintenance expenses and obviate high cost and unscheduled downtimes. Fault diagnosis systems that provide features extraction and patterns classification of the fault are able to detect and classify the failures in machines. The majority of the related works that reported a procedure for detection of rotor bar breakage so far have applied motor current signal analysis using discrete wavelet transform. In this paper, the most appropriate features are extracted from the coefficients of a wavelet packet transform after fast Fourier transform of current signal. The aim of this study is to develop an effective and sensitive method for fault detection under low load conditions. Through combining the strength of both time-scale and frequency domain analysis techniques, a unified wavelet packet signature analysis pinpoints the fault signature in the special fault-oriented frequency bands. The wavelet analysis combined with a feed-forward neural network classifier provides an intelligent methodology for the automatic diagnosis of the fault severity during runtime of the motor. The faults severity is considered as one, two, and three broken rotor bars. The results have confirmed that the proposed method is effective for diagnosing rotor bar breakage fault in an induction motor and classification of fault severity.


Introduction
As a basic segment of the present industrial plants, electrical machines expedite industrial tasks and productions. Owing to its robust, well-constructed, and simple design, induction motors (IMs) are widely used in manufacturing processes. Nevertheless, three-phase induction machines are subjected to inevitable electrical, mechanical, and environmental stresses. Among various failures occurred in different parts of a machine, broken rotor bar (BRB) is of significance due to its serious consequent malfunctions caused. Presence of broken rotor bar leads to torque reduction, incompatible motor operation and safety concerns [1]. In order to ensure the availability of industrial systems and the safety of goods and persons on the site, the monitoring and diagnosis of rotor faults are of prime importance. Rotor irregularity in the machine can be diagnosed at an early stage by the processing Only one feature is extracted & Disordering of sub-band frequency [21] BRB WPD Statistical feature of wavelet packet 1-d Haar Adaptive neuro-fuzzy inference system Only one mother wavelet is studied & Lack of BRB fault severity detection [22] Rotor WPD, Empirical Mode Decomposition Energy moment of IMFs Db 5 Multi-layer feed-forward neural network Only one mother wavelet is studied & Only one feature is extracted WPD, wavelet packet decomposition; BRB, broken rotor bar; IMF, the intrinsic mode functions.
Almost all methods for fault classification are sophisticated. However, a trade-off exists so that increasing the complexity the fault detection capability is increased together with computational cost [23]. Therefore, the contribution of this paper consists of three points: In the signal processing step, exact localized fault frequency sub-bands are determined based on the combination of WPD and FFT named as wavelet packet signature analysis (WPSA) under arbitrary load conditions. In the feature extraction step, wavelet statistical parameters of are obtained from stator current and they are used as an input vector to the neural network (NN). In the classification step, a straightforward, small-sized, low-cost multi-layer perceptron neural network (MLP-NN) is used in order to have an intelligent, reliable, and noninvasive classifier.

Broken Rotor Bar in IMs
The squirrel-cage rotor is the inner part of the motors, and it is rotated by electromagnetic field, which is induced in its coils by stator field. The rotor then applies the rotational force to the external equipment. Squirrel-cage rotor, depending on the construction of the cage, is divided into cast and fabricated rotor. The material used for cast cage is aluminum and cast cage rotors are generally used in small-sized motors, whereas the material used for fabricated cage is copper and fabricated cage rotors are used for high-power motors. Although the squirrel-cage rotor is rugged, rotor defects, such as broken rotor bar, cracked end-ring, and bent shaft, do occur. The percentage of motor failures attributed to rotor problems is not too large, but they can cause extensive damage to the motor if left undetected [24].
Broken bar faults may happen due to a variety of reasons, such as mechanical, thermal, or magnetic stresses; environmental stresses during motor operation; and defects in design of motor structure and its manufacturing [1]. Among different types of rotor fault, broken bar and end-ring are mainly caused by manufacturing defects and excessive start-stop cycles or frequent speed changes. Motors of low and medium power generally involve casting rotor bars. Small defects that may occur during the casting process cause important failures in the bar and the other reasons mentioned in [25]. In the rotor of a high-power motor, copper bars are generally connected to the end-rings through welding, and if this procedure are not performed carefully, some defects are generated [26]. Rotor design plays a key role in the severity of rotor irregularity. If the rotor has a closed bar design, the fault severity is expected to be low because of the iron acting of the rotor that holds the asymmetrical bar in place. Nevertheless, if the rotor has an open bar design, the asymmetrical severity enhances significantly [24].
Rotor bar failures bring about secondary failures in other parts of electrical machines. These secondary failures cause severe malfunctions of the motor and reduce the motor efficiency that increases the operational costs. For example, current in bars adjacent to the broken one increases up to 50% of rated current [27] and thus causes unbalanced currents and torque pulsation, which decrease the average torque [28]. When distribution of rotor current is changed, adjacent bars to the broken one are overheated, which causes other irregularities [29] and breakage of several other bars [30]. Variation of heating around the bars can also make a bow in the rotor and then generate eccentricity. Rotor eccentricity causes basic rotor unbalance and a greater unbalanced magnetic pull [31]. Moreover, if a broken rotor bar rises out of the slot due to the centrifugal force, the bar will contact the stator winding and damage it. The small pieces come from a broken rotor bar also damage the stator windings and laminations during operation [25]. In addition to the secondary failures from broken rotor bar, mentioned above, this failure leads to a shaft vibration and thus air gap eccentricity [9]. Besides vibration, broken rotor bar also causes sparking and noise during the motor start-up and its normal operation [25,32], which threatens the operation safety. It is evident that during start-up, more excessive vibration, more destructive sparking, and louder noise are generated.
According to the explanation above, the effects of broken bar significantly lessen the efficiency and performance of electrical machines and early detection of this failure is essential [33]. Detection of broken rotor bar in its early ages not only secures the motor performance but also reduces risk of other types of failures. When the failure is at its early stage, symptoms of the faults are small and the motor apparently operates normally, therefore, the fault cannot be detected [10]. A relatively large number of studies have been performed on early detection of rotor faults, especially broken rotor bar, in squirrel-cage induction machines [1].

Effect of Broken Rotor Bar Fault on Rotor Magneto-Motive Force
The effect of broken rotor bar on the rotor magneto-motive force (MMF) and consequently its impact on the stator current waveform will be discussed in this section. The MMF of the rotor can be resolved into the forward component corresponding to the healthy case, which is rotating at synchronous speed, ω syn , with respect to the stator (or sω syn with respect to the rotor). However, when a rotor bar cracks, no current can flow through the bar, and hence no magnetic flux is generated around that bar. In the case that there is no magnetic flux around a bar, a non-zero backward rotating field and thus an asymmetry in the rotor MMF are produced. It has to be noted that for a symmetrical rotor with no broken bar, the resultant of backward rotating field is zero. The non-zero backward rotating component, which is generated due to the virtual presence of a bar carrying an equal opposite current to the original bar, is subject to breakage in the healthy rotor.
This non-zero backward MMF due to broken bar rotates at slip frequency corresponding to the slip speed, sω syn , with respect to the rotor and induces harmonic currents in the stator windings, which are superimposed on the stator winding currents. Accordingly, the speed of the non-zero backward MMF with respect to the stator can be calculated as follow: These superimposed features are used as signatures for the detection of broken rotor bar in MCSA techniques [33].

Effect of Broken Rotor Bar in Frequency Domain
The non-zero backward MMF induces electro motive forces (EMFs) in the stator windings at a frequency equal to f brb = (1 − 2s) f s , where f s is the fundamental frequency, s is slip, and k = 1, 2, 3 . . . The fundamental frequency is defined as f s = (p/4π)ω syn , where p is the number of poles and ω syn is synchronous speed. This sideband component appears in the frequency spectrum of the stator around the fundamental frequency in the presence of the cracked or broken rotor bar. It has been indicated that the amplitude of this sideband frequency component is proportional to the number of broken rotor bars present in the electrical machine [34]. Another parameter that can affect the magnitude of these sidebands is the level of motor-load [1,35].

The Construction of Wavelet Packet Signature Analysis
Mallat in 1989 introduced a recursive algorithm known as the pyramid algorithm [36]. This algorithm, which is deemed to be an important algorithm for computing the DWT coefficients, consists of a conjugate pair of low-(H) and high-(G) pass filters followed by down sampling giving two coefficients, namely details and approximations. In the next decomposition levels, the filters are only applied to the approximations. Compared with DWT, only the approximate coefficients will be further decomposed at each level, both the approximate coefficients and the detail coefficients will be further decomposed at each level in WPD that offers a richer signal analysis [37,38]. Wavelet packet atoms are waveforms with three naturally interpreted indices.
where integer j determines the dilation, and k and i are called the time-localization and modulation parameters of the wavelet packet, respectively. Wavelet packet atoms are defined by the following sequence of functions: The conjugate mirror filters h(k) and g(k) with finite impulse responses (FIR) of size k can define the fast binary WPD algorithm of the signal. The first two WP functions ψ 0 0.0 (t) = ϕ(t) and ψ 1 0.0 (t) = ψ(t) are also called the scaling function and wavelet function, respectively [36,[39][40][41]. Each output of the filter consists of N/2 wavelet coefficients. The wavelet packet coefficients (WPC) of signal f (t) are calculated by taking the inner product of the signal and basic function: Since the wavelet coefficients will highpoint the changes in signals, the wavelet coefficients-based features are relatively suitable for early and high sensitivity fault detection. The wavelet packet coefficient features have been broadly used for characterizing machine faults. Although these coefficients are associated with frequency components, they are modified in the time domain (each coefficient, C i , corresponds to a time range). Each C i j (k) coefficients measure a specific sub-band frequency content. For a discrete signal, The frequency bandwidth of WPT coefficient can be calculated by [42,43].
The wavelet transform is commonly used in the time domain, which is why the frequency localization of wavelet packets is more complicated to analyze. The popularity of wavelets is due to its dilation and translation properties. Dilation property is used to adjust the width of the frequency band along with the location of its center frequency, and the translation property can be used to automatically zoom in and out in order to locate the positions of high-frequency and low-frequency changes. As mentioned above, the parameter i is the interpretation of oscillation. The basic idea of the wavelet packets is that for fixed values of j and k, ψ i j.k (t) analyzes the fluctuations of the signal roughly around the position 2 j k, at the scale 2 j , and at various frequencies for the different admissible values of the last parameter i. Note that the method of decomposition described above does not result in a WPT tree displayed in increasing frequency order. In fact, the delicate point is to realize that the naturally order for i = 0, 1, 2, 3, 4, 5, 6, 7 does not matched exactly the order defined by the number of oscillations in wavelet packet decomposition. The frequency order is i = 0, 1, 3, 2, 6, 7, 5, 4 as highlighted in Figure 1 by dashed lines. On the other hand, due to the trade-off between time localization and frequency resolution, the resolutions in time and frequency domains cannot reach their highest levels concurrently. Therefore, the signal processing tool is required to have the frequency resolution power of the Fourier transform and the time resolution power of the wavelet transform. To obtain frequency characteristic associated with BRB and develop data analysis of wavelet transform, further FFT analysis can be enhanced the overall effectiveness of feature extraction. In general, spectral analysis and wavelet packet transform represent a signal from different perspectives.

Proposed Fault Diagnosis Algorithm
In order to assess the proposed intelligent diagnosis system, the main characteristics of our experiments performed on a three-phase induction machine with: Rated voltage: 415 V, Rated power: 750 W, six poles, primary rated current: 2.1 A, rated speed: 915 rpm. As shown in Figure 2a, the bench test has been set up, consisting of squirrel-cage induction motors, an AC generator, and a resistive bank, which provide four different loadings 10%, 35%, 50%, and 80% of the full load under both healthy and faulty conditions to study the effects of putting various loads in the fault identification procedure. For each condition of the motor, 20 sets of stator current signal have been collected and was sampled at 2 KHz before and after defects with a sample number of N = 2000 during the steadystate operating condition of the motor. In total, 320 sets of experiment were performed; 20 sets of motor current from four motors with different severity operating at four different load conditions. Figure 2b also demonstrates of the healthy rotor and a rotor with three broken bars. By looking at the current signal measured, we cannot realize the motor condition if broken rotor bar exists. In the word, the raw current signal does not indicate the difference between healthy and On the other hand, due to the trade-off between time localization and frequency resolution, the resolutions in time and frequency domains cannot reach their highest levels concurrently. Therefore, the signal processing tool is required to have the frequency resolution power of the Fourier transform and the time resolution power of the wavelet transform. To obtain frequency characteristic associated with BRB and develop data analysis of wavelet transform, further FFT analysis can be enhanced the overall effectiveness of feature extraction. In general, spectral analysis and wavelet packet transform represent a signal from different perspectives.

Proposed Fault Diagnosis Algorithm
In order to assess the proposed intelligent diagnosis system, the main characteristics of our experiments performed on a three-phase induction machine with: Rated voltage: 415 V, Rated power: 750 W, six poles, primary rated current: 2.1 A, rated speed: 915 rpm. As shown in Figure 2a, the bench test has been set up, consisting of squirrel-cage induction motors, an AC generator, and a resistive bank, which provide four different loadings 10%, 35%, 50%, and 80% of the full load under both healthy and faulty conditions to study the effects of putting various loads in the fault identification procedure. For each condition of the motor, 20 sets of stator current signal have been collected and was sampled at 2 KHz before and after defects with a sample number of N = 2000 during the steady-state operating condition of the motor. In total, 320 sets of experiment were performed; 20 sets of motor current from four motors with different severity operating at four different load conditions. Figure 2b also demonstrates of the healthy rotor and a rotor with three broken bars. On the other hand, due to the trade-off between time localization and frequency resolution, the resolutions in time and frequency domains cannot reach their highest levels concurrently. Therefore, the signal processing tool is required to have the frequency resolution power of the Fourier transform and the time resolution power of the wavelet transform. To obtain frequency characteristic associated with BRB and develop data analysis of wavelet transform, further FFT analysis can be enhanced the overall effectiveness of feature extraction. In general, spectral analysis and wavelet packet transform represent a signal from different perspectives.

Proposed Fault Diagnosis Algorithm
In order to assess the proposed intelligent diagnosis system, the main characteristics of our experiments performed on a three-phase induction machine with: Rated voltage: 415 V, Rated power: 750 W, six poles, primary rated current: 2.1 A, rated speed: 915 rpm. As shown in Figure 2a, the bench test has been set up, consisting of squirrel-cage induction motors, an AC generator, and a resistive bank, which provide four different loadings 10%, 35%, 50%, and 80% of the full load under both healthy and faulty conditions to study the effects of putting various loads in the fault identification procedure. For each condition of the motor, 20 sets of stator current signal have been collected and was sampled at 2 KHz before and after defects with a sample number of N = 2000 during the steadystate operating condition of the motor. In total, 320 sets of experiment were performed; 20 sets of motor current from four motors with different severity operating at four different load conditions. Figure 2b also demonstrates of the healthy rotor and a rotor with three broken bars. By looking at the current signal measured, we cannot realize the motor condition if broken rotor bar exists. In the word, the raw current signal does not indicate the difference between healthy and By looking at the current signal measured, we cannot realize the motor condition if broken rotor bar exists. In the word, the raw current signal does not indicate the difference between healthy and faulty conditions because the amplitude of the fault-related frequency is much smaller than fundamental frequency.
As an example, typical stator current signals for a healthy motor as well as for a motor with one broken bar under 35% full load are illustrated in Figure 3. As it is clear, there is no observable difference between these two signals that can be utilized for fault (here broken rotor bar) detection. Therefore, a signal processing method is used to extract the fault-related feature for fault detection.
Appl. Sci. 2017, 7, x FOR PEER REVIEW 7 of 22 faulty conditions because the amplitude of the fault-related frequency is much smaller than fundamental frequency. As an example, typical stator current signals for a healthy motor as well as for a motor with one broken bar under 35% full load are illustrated in Figure 3. As it is clear, there is no observable difference between these two signals that can be utilized for fault (here broken rotor bar) detection. Therefore, a signal processing method is used to extract the fault-related feature for fault detection. The architecture of the intelligent diagnosis system, which is a combination of progressive signal processing and pattern recognition techniques, is given in Figure 4. This method not only improves the signal processing algorithm to find the exact fault-oriented sub-bands, but also implements a NNbased classifier to categorize the severity assessment. In the first step, the wavelet packet decomposition is applied to the monitored signal to extract features in more concentrated faultrelated depths and nodes in the MATLAB ® (2008) environment. In the next step, the reconstructed signals are transformed to frequency domain using FFT to find the optimum level and nodes of wavelet packet tree to detect fault in various loads, in particular no load condition. In addition, the statistical features extracted from wavelet packet coefficients are selected to form the classifier's input vector. At the end, the feed-forward neural network (NN) is employed to identify the BRBs severity. The architecture of the intelligent diagnosis system, which is a combination of progressive signal processing and pattern recognition techniques, is given in Figure 4. This method not only improves the signal processing algorithm to find the exact fault-oriented sub-bands, but also implements a NN-based classifier to categorize the severity assessment. In the first step, the wavelet packet decomposition is applied to the monitored signal to extract features in more concentrated fault-related depths and nodes in the MATLAB ® (2008) environment. In the next step, the reconstructed signals are transformed to frequency domain using FFT to find the optimum level and nodes of wavelet packet tree to detect fault in various loads, in particular no load condition. In addition, the statistical features extracted from wavelet packet coefficients are selected to form the classifier's input vector. At the end, the feed-forward neural network (NN) is employed to identify the BRBs severity. In this research, in order to illustrate the merits of taking FFT to follow the bands including fundamental-oriented frequency, the stator current signals are decomposed into approximations and details signal using WPD in the MATLAB ® environment. Then, for every selected node in proceeding levels, the wavelet packet reconstructed signals are transformed to frequency domain using FFT to find the sub-band with maximum value. The FFT is used to find the optimum level and nodes are fundamental-oriented frequency as well as the fault-related band frequency to detect fault severity in various load according to Figure 5 [15,16,44]. Consequently, regarding the closeness of broken bar fault frequency to fundamental frequency, especially in no load condition, the exact fault-related subband with high reliability is highlighted. In this research, in order to illustrate the merits of taking FFT to follow the bands including fundamental-oriented frequency, the stator current signals are decomposed into approximations and details signal using WPD in the MATLAB ® environment. Then, for every selected node in proceeding levels, the wavelet packet reconstructed signals are transformed to frequency domain using FFT to find the sub-band with maximum value. The FFT is used to find the optimum level and nodes are fundamental-oriented frequency as well as the fault-related band frequency to detect fault severity in various load according to Figure 5 [15,16,44]. Consequently, regarding the closeness of broken bar fault frequency to fundamental frequency, especially in no load condition, the exact fault-related sub-band with high reliability is highlighted. Figure 6 shows the wavelet packet tree that the red lines highlight the frequency bands directions with 50 Hz in deeper levels when the sampling rate is 2000 Hz. Fourier analysis of the reconstructed signal was performed to find the sub-band with maximum value using the MATLAB ® command line. Therefore, various features can be extracted exactly from the fault-related sub-frequency band of the WPT-based signal decomposition.  Figure 6 shows the wavelet packet tree that the red lines highlight the frequency bands directions with 50 Hz in deeper levels when the sampling rate is 2000 Hz. Fourier analysis of the reconstructed signal was performed to find the sub-band with maximum value using the MATLAB ® command line. Therefore, various features can be extracted exactly from the fault-related sub-frequency band of the WPT-based signal decomposition.   Figure 6 shows the wavelet packet tree that the red lines highlight the frequency bands directions with 50 Hz in deeper levels when the sampling rate is 2000 Hz. Fourier analysis of the reconstructed signal was performed to find the sub-band with maximum value using the MATLAB ® command line. Therefore, various features can be extracted exactly from the fault-related sub-frequency band of the WPT-based signal decomposition.  Finally, the statistical parameter of wavelet packet coefficients together with the slip speed are the input vector to the classifier for decision making approach to examine the effect of load variations and fault severity. To evaluate the effectiveness of proposed method, a NN-based approach is developed in the MATLAB environment. The algorithm is proposed to configure the best decision-making solution. WPSA is used for the feature extraction which gives distinguishable signatures from stator current signal in a specific frequency band. In the experimental setup, the left affected frequency is considered f brb = (1 − 2s) f s to discern the healthy and faulty condition and fault severity. The related depths and nodes of the WP tree constitute of most appropriate frequency range is arranged to cover various load illustrated in Table 2. Feature vector, which is one of the most significant parameters to design an appropriate neural network, was innovated by features extracted from statistical parameters of WPC. In order to check the learning ability and classification accuracy of MLP-NN classifier, 75% of data (60 samples among all 80 samples in each load condition) are tagged as training data. In this study, the leave-one-out (LOO) method has been applied for available train data. The algorithm trains the network 60 times. Each time, one out of the 60 available inputs is excluded and used for testing the performance of the constructed MLP-based classifier. The proposed NN is trained on various datasets and later validated carefully based on the second dataset (25% = 20 remained samples in each load condition), which is tagged for final testing as to ensure a completely independent test set. Figure 7 schematically demonstrates wavelet packet coefficients of raw signal and their relevant standard deviation as one of the feature vectors of neural network. The statistical parameter of wavelet packet coefficients together with the slip speed are the input vector to the classifier for decision making approach to examine the effect of load variations and fault severity.
As it can be seen, four three-layer MLP networks are dedicated to four assorted load range (80%, 50%, 35% and 10% of full load), each with 13 neurons in their hidden layers. MLP network structure, which is a feed-forward neural network (FFNN), has found an immeasurable popularity in neural network and has been frequently exploited in machine condition monitoring applications. Furthermore, to diminish the training time and accelerate the convergence of neural network in real-world applications, specifically fault detection and classification procedure, it is preferred to use a small-sized and fixed-dimension vector for training purposes. Therefore, in this study, by preparing the finite-dimension input vectors from all classes under investigation, the utilized classification approach with FFNN structure was employed as the base of severity assessment. The mean sum of squares of the network errors is a typical performance function for feedforward networks which is defined as: where, e i = t i − y i , error is desired output t minus actual outputs y. In order to overcome the problem of overfitting and improving generalization, particularly in a small set of data base, the performance function can be modified by: msereg = qmse + (1 − q)msw, where q is the performance ratio, and msw is the mean of the sum of squares of the network weights and biases. As mentioned in the help section of MATLAB ® , using this performance function causes the network to have smaller weights and biases, and this drives the network response to be smoother and less probable to overfit. Another method to evaluate how well the neural network model is, is the Cross-Validation (CV) procedure. It is also very useful for small datasets, since it allows one to use the entire dataset for training and testing. It was found that four fully connected networks each constitutes a separate category of fault number with four output nodes give

Results and Discussion
After wavelet packet decomposition to the eleven levels of resolution using the selected mother wavelet "db44", coefficients-related features were extracted and calculated by the 14 statistical parameters as sorted in Table 3, having the average of 20 samples for each mode of rotor in the case of high load. In Tables 4-6, the features with increasing trend, such as root-mean-square (RMS), rootsum-square (RSSQ), energy, and standard deviation (StD), are tabulated in the case of medium, low, and no load respectively.

Results and Discussion
After wavelet packet decomposition to the eleven levels of resolution using the selected mother wavelet "db44", coefficients-related features were extracted and calculated by the 14 statistical parameters as sorted in Table 3, having the average of 20 samples for each mode of rotor in the case of high load. In Tables 4-6, the features with increasing trend, such as root-mean-square (RMS), root-sum-square (RSSQ), energy, and standard deviation (StD), are tabulated in the case of medium, low, and no load respectively. The bold numbers mean: the valus of a feature follow an incresing or decreasing trend regarding the fault severity. The bold numbers mean: the valus of a feature follow an incresing or decreasing trend regarding the fault severity. The bold numbers mean: the valus of a feature follow an incresing or decreasing trend regarding the fault severity. The bold numbers mean: the valus of a feature follow an incresing or decreasing trend regarding the fault severity.
The distances between healthy and faulty conditions indicate the more efficiency of RMS, RSSQ, energy, and StD features of wavelet packet coefficients. Therefore, these indices are compared for (Level 6-Depth 3) for High Load, (Level 7-Depth 5) for half of the full load, (Level 8-Depth 10) for low load, and (Level 9-Depth 21) under no load condition to define the most appropriate frequency band to represent the frequency components caused by the BRB malfunction in the induction motor. Root-mean-square of wavelet packet coefficients is depicted in Figure 8.  The bold numbers mean: the valus of a feature follow an incresing or decreasing trend regarding the fault severity.
The distances between healthy and faulty conditions indicate the more efficiency of RMS, RSSQ, energy, and StD features of wavelet packet coefficients. Therefore, these indices are compared for (Level 6-Depth 3) for High Load, (Level 7-Depth 5) for half of the full load, (Level 8-Depth 10) for low load, and (Level 9-Depth 21) under no load condition to define the most appropriate frequency band to represent the frequency components caused by the BRB malfunction in the induction motor. Root-mean-square of wavelet packet coefficients is depicted in Figure 8. Stability and convergence of the network are directly dependent on the distinguished feature vector with enough samples. In order to have adequate data without redundancy and remove extraneous information and reduce the burden of the classification system, some combination of features is tested to select the most superior features. To arrange the input vector for training, a simple sequential floating forward selection is used. As a result, three features are selected among all 15 available features and fed to the network to train the network and observations are recorded. In next step, the selection is repeated for the remained 12 features, and other two selected features add to the first three ones. The new five features are also used to train and test the network. Gradually, based on this trend, the number of inputs is increased to 7,9,11,13, and finally all 15 features are fed to the network, and the network performance is observed carefully in terms of cross-validation (CV) Stability and convergence of the network are directly dependent on the distinguished feature vector with enough samples. In order to have adequate data without redundancy and remove extraneous information and reduce the burden of the classification system, some combination of features is tested to select the most superior features. To arrange the input vector for training, a simple sequential floating forward selection is used. As a result, three features are selected among all 15 available features and fed to the network to train the network and observations are recorded. In next step, the selection is repeated for the remained 12 features, and other two selected features add to the first three ones. The new five features are also used to train and test the network. Gradually, based on this trend, the number of inputs is increased to 7,9,11,13, and finally all 15 features are fed to the network, and the network performance is observed carefully in terms of cross-validation (CV) classification accuracy and testing Mean Squared Error (MSE). As it can be observed from Figure 9, the combination of five features in high load condition designs a network with 98.39% average classification accuracy during training and 0.005 as average of MSE for testing data. In case of medium load when nine features are selected for inputs, testing MSE is minimum (0.003) and CV classification accuracy is maximum (98.37%), according to Figure 10. Figure 11 demonstrates low load condition when five features are used as input vectors to get 100% classification accuracy. The MSE for testing dataset is 0.003. As can be seen from Figure 12, in case of no load when seven features are conjugated as input vector, testing MSE is 0.004 and CV classification accuracy is 96.61%.
Appl. Sci. 2017, 7, x FOR PEER REVIEW 14 of 22 classification accuracy and testing Mean Squared Error (MSE). As it can be observed from Figure 9, the combination of five features in high load condition designs a network with 98.39% average classification accuracy during training and 0.005 as average of MSE for testing data. In case of medium load when nine features are selected for inputs, testing MSE is minimum (0.003) and CV classification accuracy is maximum (98.37%), according to Figure 10. Figure 11 demonstrates low load condition when five features are used as input vectors to get 100% classification accuracy. The MSE for testing dataset is 0.003. As can be seen from Figure 12, in case of no load when seven features are conjugated as input vector, testing MSE is 0.004 and CV classification accuracy is 96.61%.   classification accuracy and testing Mean Squared Error (MSE). As it can be observed from Figure 9, the combination of five features in high load condition designs a network with 98.39% average classification accuracy during training and 0.005 as average of MSE for testing data. In case of medium load when nine features are selected for inputs, testing MSE is minimum (0.003) and CV classification accuracy is maximum (98.37%), according to Figure 10. Figure 11 demonstrates low load condition when five features are used as input vectors to get 100% classification accuracy. The MSE for testing dataset is 0.003. As can be seen from Figure 12, in case of no load when seven features are conjugated as input vector, testing MSE is 0.004 and CV classification accuracy is 96.61%.      As observed from the training result, some measured properties of the classifier performance on CV data, tabulated in Table 7, indicate the network classification capability with correct ratio around one (CorrectRate = Correctly Classified Samples/Classified Samples). The amount of averaged MSE, minimum observed MSE, and Root Mean Squared Error (RMSE), and correct classification accuracy of MLP-NN-based classifier are sorted for test data in Table 8.   As observed from the training result, some measured properties of the classifier performance on CV data, tabulated in Table 7, indicate the network classification capability with correct ratio around one (CorrectRate = Correctly Classified Samples/Classified Samples). The amount of averaged MSE, minimum observed MSE, and Root Mean Squared Error (RMSE), and correct classification accuracy of MLP-NN-based classifier are sorted for test data in Table 8. Root mean squared error is known as the fit standard error; the lower value of RMSE indicated, the closer to the NN convergence value. It can be seen that the RMSE on testing samples are very small, which means that the NN classifies and estimate correctly. In addition, average classification accuracy on testing instances is obtained as 98.80%, indicating a reasonable classification. To show how close is actual output to the desired output, the comparative results of target and output of the back propagation (BP) neural network in all load conditions can be sorted as shown in Table 9.   show the difference between desired output and real output for test dataset in 80%, 50%, 35%, and 10% of full load operating condition, where MSE and RMSE are calculated according to MATLAB ® code from plot results function. The whole procedure was repeated ten times and performance measure for the average output was calculated. Finally, the corresponding value of correct classification function was calculated according to MATLAB ® code. The whole procedure of NN testing was repeated five times and performance measure for the average output was calculated. As it can be seen, the target is between 0.1 and 0.9 as mentioned before (215-217), and output could be close to target. The maximum error is highlighted through dashed yellow lines, which are around 0.15. However, among all samples, the computed maximum MSE is 0.0048855, which is small enough.

Testing on Test Data
It has been found that the presented network is able to detect the faults in the induction motor with average classification accuracies of 98.80% to the corresponding severity level or class. Once the NN is trained and tested carefully on different data sets, namely, cross validation and testing data sets for the various performance measures, it will be ready to use for the real-world applications.    show the difference between desired output and real output for test dataset in 80%, 50%, 35%, and 10% of full load operating condition, where MSE and RMSE are calculated according to MATLAB ® code from plot results function. The whole procedure was repeated ten times and performance measure for the average output was calculated. Finally, the corresponding value of correct classification function was calculated according to MATLAB ® code. The whole procedure of NN testing was repeated five times and performance measure for the average output was calculated. As it can be seen, the target is between 0.1 and 0.9 as mentioned before (215-217), and output could be close to target. The maximum error is highlighted through dashed yellow lines, which are around 0.15. However, among all samples, the computed maximum MSE is 0.0048855, which is small enough.
It has been found that the presented network is able to detect the faults in the induction motor with average classification accuracies of 98.80% to the corresponding severity level or class. Once the NN is trained and tested carefully on different data sets, namely, cross validation and testing data sets for the various performance measures, it will be ready to use for the real-world applications.

Conclusions
In this study, an advanced wavelet-based signal processing method has been applied to extract the required information out of that monitored signal. In order to be one hundred percent sure that the selected sub-band after WPD is a fault-oriented sub-band, the combination of WPT and FFT, which is named WPSA, is used to complement the BRB fault diagnosis reliability. The WPSA extracts the superimposed spectral harmonics of the wavelet packet reconstructed signal using FFT. As a result of WPSA, the overall effectiveness of defect-dependent feature extraction is enhanced. The defect frequency band in wavelet packet decomposition is directly related to the sampling frequency and follows the frequency order which is not the same as node order. Daubechies 44 showed the most similarity across faulty rotor bar for no load, the low, medium, and high load of the rated torque.
The proposed method is able to introduce a sub-band that involves the fault characteristics. By implementing wavelet statistical parameters, it is shown that the RMS, RSSQ, StD, and energy value of wavelet packet coefficients are the appropriate features for BRB detection, even in no load condition. The WP-based statistical parameters are found to be fast, accurate, and easy to implement. The present approach based on statistical feature of WPSA gets stronger with a computationally more efficient and intelligent decision-making technique. To generalize the design network especially for small datasets, the Leave-One-Out (LOO) cross validation (CV) technique is applied to multiple MLPbased NN for available train data. Stability and convergence of the network are directly dependent on the distinguished feature vector with enough samples. In order to have adequate data without redundancy and remove extraneous information and reduce the burden of the classification system, the most superior features are selected based on simple sequential feed-forward selection. RMSE on testing samples are very small, which means that the NN classifies and estimate correctly. The lower value of RMSE indicates the closer to NN convergence value. It has been found that network is able to detect the faults in the rotor with average classification accuracies of 98.80% on testing data to identify rotor bar breakage severity in four load range (80%, 50%, 35% and 10% of full load).
In most NN-based fault classification schemes, a hybrid complex network is used, whereas it is seen that a simple, small-sized multilayer perceptron network works as magnificent fault classifier for intelligent condition-based monitoring of three-phase induction. As compared to existing structures, the suggested scheme is simple, cost-effective, reliable, and accurate.

Conclusions
In this study, an advanced wavelet-based signal processing method has been applied to extract the required information out of that monitored signal. In order to be one hundred percent sure that the selected sub-band after WPD is a fault-oriented sub-band, the combination of WPT and FFT, which is named WPSA, is used to complement the BRB fault diagnosis reliability. The WPSA extracts the superimposed spectral harmonics of the wavelet packet reconstructed signal using FFT. As a result of WPSA, the overall effectiveness of defect-dependent feature extraction is enhanced. The defect frequency band in wavelet packet decomposition is directly related to the sampling frequency and follows the frequency order which is not the same as node order. Daubechies 44 showed the most similarity across faulty rotor bar for no load, the low, medium, and high load of the rated torque.
The proposed method is able to introduce a sub-band that involves the fault characteristics. By implementing wavelet statistical parameters, it is shown that the RMS, RSSQ, StD, and energy value of wavelet packet coefficients are the appropriate features for BRB detection, even in no load condition. The WP-based statistical parameters are found to be fast, accurate, and easy to implement. The present approach based on statistical feature of WPSA gets stronger with a computationally more efficient and intelligent decision-making technique. To generalize the design network especially for small datasets, the Leave-One-Out (LOO) cross validation (CV) technique is applied to multiple MLP-based NN for available train data. Stability and convergence of the network are directly dependent on the distinguished feature vector with enough samples. In order to have adequate data without redundancy and remove extraneous information and reduce the burden of the classification system, the most superior features are selected based on simple sequential feed-forward selection. RMSE on testing samples are very small, which means that the NN classifies and estimate correctly. The lower value of RMSE indicates the closer to NN convergence value. It has been found that network is able to detect the faults in the rotor with average classification accuracies of 98.80% on testing data to identify rotor bar breakage severity in four load range (80%, 50%, 35% and 10% of full load).
In most NN-based fault classification schemes, a hybrid complex network is used, whereas it is seen that a simple, small-sized multilayer perceptron network works as magnificent fault classifier for intelligent condition-based monitoring of three-phase induction. As compared to existing structures, the suggested scheme is simple, cost-effective, reliable, and accurate.