Multi-Fault Diagnosis in Three-Phase Induction Motors Using Data Optimization and Machine Learning Techniques

: Induction motors are very robust, with low operating and maintenance costs, and are therefore widely used in industry. They are, however, not fault-free, with bearings and rotor bars accounting for about 50% of the total failures. This work presents a two-stage approach for three-phase induction motors diagnosis based on mutual information measures of the current signals, principal component analysis, and intelligent systems. In a ﬁrst stage, the fault is identiﬁed, and, in a second stage, the severity of the defect is diagnosed. A case study is presented where different severities of bearing wear and bar breakage are analyzed. To test the robustness of the proposed method, voltage imbalances and load torque variations are considered. The results reveal the promising performance of the proposal with overall accuracies above 90% in all cases, and in many scenarios 100% of the cases are correctly classiﬁed. This work also evaluates different strategies for extracting the signals, showing the possibility of reducing the amount of information needed. Results show a satisfactory relation between efﬁciency and computational cost, with decreases in accuracy of less than 4% but reducing the amount of data by more than 90%, facilitating the efﬁcient use of this method in embedded systems.


Introduction
The fast development and modernization of industrial processes have been accompanied by an increase in the complexity of the structures and equipment. As a result of this development, organizations seek to ensure the equipment reliability against unexpected failures in industrial processes, which can result in economic losses [1].
Due to the high cost of periodic maintenance of industrial equipment, companies are looking for other maintenance strategies to ensure the reliability of this equipment [2]. It is essential to highlight that the monitoring based on the equipment conditions, through vibration, temperature, current, voltage signals, among other quantities, is one of the most effective preventive maintenance methodologies, as it provides the operating condition of the machines. Thus, these methodologies allow the operator to set the best time to perform the maintenance, increasing the reliability and the efficiency of the systems [3]. Among industrial equipment, three-phase induction motors (TIMs) play an important role, being estimated that they consume up to 50% of all the generated capacity in industrialised countries [4], reaching a market value of US$ 17.5 Billion in 2020 [5].
The main TIM faults are related to bearings, stator, and rotor. This work addresses bearing and rotor failures, representing approximately 50% of defects present in TIMs [6].
The bearing failures are usually the result of contamination, corrosion, inadequate lubrication, and installation problems, such as misalignment and overload [7]. On the other hand, rotor broken bars are associated with the machine manufacturing process, overloads of operating conditions, and mechanical cracks [8]. Thus, fault diagnosis at an early stage is essential to maintain the continuous operation of industrial processes.
Researchers are developing methodologies for the proper fault diagnosis in TIMs, which are generally based on three approaches: (i) feature extraction; (ii) model; and (iii) knowledge-based approaches [3,9]. The first approach seeks to extract and select the most relevant characteristics of the signals, assisting in detecting failures. Generally, these fault signatures are obtained by spectral decomposition of the signals using conventional methodologies, such as Fourier Transform (FT) [10], modifications of the FT, such as the Sliding Discrete Fourier Transform [11], Wavelet Transform (WT) [3,9] and Hilbert Transform (HT) [12,13], as well as other more recent approaches, such as the Orthogonal Matching Pursuit (OMP) [3,14].
In the works of Ali et al. [3] and Ali et al. [9], the OMP and the Discrete WT techniques were evaluated for the characteristic extraction of the current and vibration signals for the identification of multiple failures in TIMs fed directly by the electrical network and by frequency inverters. The patterns extracted in the processing of these signals were analyzed by several classifiers available in the MATLAB Classification Learner toolbox. Simulating various operating conditions such as variations in the load torque and in the supply voltage levels, classification rates above 90% were achieved.
To reduce the dimensionality of the data, Principal Component Analysis has been used in several approaches. In Stief et al. [15], a Bayesian methodology combined with PCA was developed to diagnose stator, rotor, and bearing failures in line connected TIMs. PCA is used to remove the correlations present in the characteristics extracted from the voltage, current, vibration, and acoustic signals of the machine and to reduce the influence of the different load conditions in which the machine operates. Experimental results with accuracy above 94% prove the satisfactory performance of the methodology in the diagnosis of these defects. Recently, the manuscript of Juez-Gil et al. [16] presented a new technique based on multiple sensors for the incipient diagnosis of multiple failures in TIMs fed by frequency inverters, and PCA to reduce the dimensionality and extract the most significant fault characteristics present in the collected data of vibration, current, voltage, and speed signals. In the pattern classification stage, the researchers used the decision trees. The experimental results observed the promising performance and the insensitivity of the proposed approach to several levels of load torque.
Ewert et al. [17] studied bearing failures in TIMs using the envelope and the spectral analyses of the vibration signals. For the fault classification, an artificial neural network was employed. In the referred paper, above 90% of the experimental samples were correctly detected, even under different operating conditions, such as variations in the motor load torque.
In addition to obtaining the characteristic fault parameters in the frequency domain, recent researchs have adopted feature extraction methods using signals in the time domain [18,19]. In Contreras-Hernandez et al. [18], the authors used the Quaternion Signal Analysis (QSA) method to obtain the behavior models of current and vibration signals from TIMs. For pattern classification, a decision tree was used to detect bearing failures and rotor unbalance, obtaining accuracies of 99%.
Alternatively, Jiang et al. [19] proposed an approach for the diagnosis of bearing, stator, and rotor faults in TIMs fed by the grid, which extracts the patterns of the current and acoustic signals using the Singular Value Decomposition (SVD) and a Feature Incremental Broad Learning (FIBL). Even when the machines were subject to variations in the load and to supply voltage unbalance, the proposed methodology obtained classification rates above 92%, showing that it is promising in diagnosing these defects.
The model-based approaches seek to predict the behavior of defect conditions through mathematical modeling of the machine signals. The main disadvantage of this methodology may be related to the machine's natural wear since the machine components degradation leads to the increase of the difference between the machine and its mathematical model. Moreover, it is essential that the machine parameters are available, which does not always occur, making the diagnosis more difficult, as there is a need to estimate them for the accurate modeling of the machine [3,20].
In Sabouri et al. [20], a flexible analytical model from TIM was recently used, which made it possible to include defects in the broken rotor bars and the stator short-circuit. In addition to using the TIM model, the proposed non-invasive technique is based on the Particle Swarm Optimization (PSO) algorithm and the Pendulous Oscillation Phenomenons (POP) rotating magnetic field produced by the faults. Although it presents a variation between the simulation and experimental results, the methodology showed adequate performance given the several operating conditions of the industrial environment.
On the other hand, knowledge-based approaches do not require the TIMs models, as well as they do not require the characteristics of the motor and the load coupled to the machine. Moreover, these methods have been widely applied in fault diagnosis on nonlinear complex and time-varying systems, proving to be promising in identifying defects in TIMs [3,9]. Among the most used machine learning tools, it can be highlighted Artificial Neural Networks (ANN), Naive Bayes (NB), Support Vector Machines (SVM), k-Nearest Neighbor (k-NN) algorithm, Fuzzy Logic and Hybrid methods [3,6,9,13,16,17,21,22]. Nowadays, Deep Learning (DL) architectures have attracted the attention of several researchers, mainly those based on Convolutional Neural Networks (CNN), these algorithms are able to extract the failure characteristics of signals, as well as eliminating the need for a priori knowledge, allowing efficient independent fault diagnosis in electrical machines [13,21,[23][24][25][26][27].
In the work of Wang et al. [23], a one-dimensional CNN was used to analyze the most appropriate characteristics of the vibration and current signals to diagnose problems related to the rotor, bearing, and eccentricity in TIMs fed by frequency inverters. Considering that the characteristic patterns of different faults generally require different time windows for the correct diagnosis, it was designed an information fusion model at several levels. This methodology, denominated Multi-Resolution and Multi-Sensor Fusion Network (MRSFN), allows a proper representation of different resolutions, assisting in the fault detection process. Recently, the researchers of Xue et al. [13] employed a model of deep learning, known as Deep Convolutional Neural Network (DCNN) for automatically extracting the failure characteristics of the envelope spectra of the vibration signals. The patterns extracted by DCNN were used as input to the SVM classifier for the detection of bearing and rotor faults. In the experimental results, classification rates above 98.0% were observed.
Alternatively, in Abid et al. [21], it was presented an approach to diagnose broken rotor bars and bearing faults in TIMs connected directly to the grid, employing a SincNet deep architecture. This methodology has the ability to learn the characteristic fault patterns automatically directly from current signals of the machines. The classification accuracies above 99.0% demonstrated the ability to diagnose defects, even when the machines are subject to variations in the load torque.
In reference Kumar and Hati [25], an adaptive gradient optimizer based deep convolutional neural network (ADG-dCNN) technique was used to diagnose multiple bearing faults and broken rotor bars in the line connected TIMs. This methodology has the ability to automatically extract the fault characteristics present in the vibration signals, minimizing the need for human knowledge and reducing human intervention. Classification accuracy above 99% demonstrates the methodology's ability to diagnose multiple TIM failures. The experimental results showed the promising performance and the proposed approach's insensitivity to several levels of load torque.
Recently in the work of Piedad et al. [27], a frequency occurrence plot-based convolutional neural network was presented for the diagnosis of several types of TIM faults. First, the current signals were collected and processed into frequency occurrence plots (FOPs). Afterward, a CNN was employed to recognize the patterns obtained in the feature extrac-tion stage. Experimental results over 92.4% demonstrated the capacity of the proposed technique in induction motor fault diagnosis, even in different operating conditions, such as several levels of load mechanical torque.
Accordingly, this work presents an alternative two-stage approach that uses signal processing tools in the time domain, and intelligent systems to identify multiple faults and their respective severities in TIMs. The first stage is carried out by comparing multiple faults in bearings and rotor bars. In the second stage, the severity of these defects is identified. In this case, this work proposes the diagnosis of the machine real situation, allowing the decision-making for the operator in the best maintenance schedule of the equipment. For the extraction and selection of the relevant characteristics of the signals, it combines the use of a technique based on the information theory measures and PCA. Afterward, the extracted patterns are used as an input matrix for the MLP ANN. Experimental tests were carried out experimentally in two motors, one of 0.74 kW and another of 1.48 kW, operating in a steady-state form. These motors were connected to sinusoidal power supplies and subjected to several conditions such as voltage unbalances and mechanical coupled loads (60% ≤ T n ≤ 120%, where T n is the nominal load torque), which brings this study closer to natural operating conditions. For any operating condition, severity levels of 15, 30 and 60 min of abrasive wear on the bearings, and 1, 2, 2-2 and 4 broken rotor bars were evaluated.

Theoretical Background of the Tools Applied in the Diagnosis of Multiple Faults in TIMs
This work presents a methodology that performs the detection and diagnosis of the severity of multiple failures (excessive wear of bearings and broken rotor bars) in a lineconnected TIM, subjected to different operating conditions related to the industrial process. The next subsections address the general aspects of the tools used in signal processing, and pattern classification.

Mutual Information
This tool is characterized by its ability to extract relevant information from two random variables by calculating the stochastic dependence between them, allowing to determine the standard information, in addition to reducing the level of uncertainty associated with a variable due to the information brought for another [28]. This measure is defined by Equation (1).
where x and y are random variables; p(x) and p(y) are the marginal probability density functions (pdf) of x and y, respectively; and p(x, y) is the joint pdf between these two variables x and y.
In this work, the stator line current signals of TIMs are used, which are 120 degrees out of phase with each other. Thus, it is necessary to employ a variation of the MI, known as Shifted Mutual Information (SMI), which can estimate the dependence between random variables as a function of a time displacement τ [29]. This measure is defined using Equation (2).
This measurement was widely used in the feature extraction to assist the solution of problems related to pattern recognition, as can be seen in the studies of Romero-Troncoso et al. [30], Li et al. [31] and Leiva-Murillo and Artés-Rodríguez [32]. The next subsection presents the concepts related to the PCA tool used to select the most relevant information collected through the SMI.

Principal Component Analysis
PCA is a mathematical tool that reduces the number of elements in the data set. It performs an analysis of multiple variables using an orthogonal transformation to modify a set of possibly correlated observation variables to a set of uncorrelated linear variables known as principal components. The reduction in the number of elements in a data set does not occur by selecting its most relevant variables, but by the generation of new attributes using the linear combination of the original set.
As described in Jolliffe [33], the analysis is performed by two main components. The first component is determined by the linear combination of the original variables with the most considerable variance. In contrast, the second is defined by the linear combination of the original variables with the second-largest variance and orthogonal to the first principal component. Still, it is emphasized that the first main components influence the variance of the original data. As a result, a part of this set can be disregarded with minimal loss of the variance to decrease the number of elements.
Thus, the transformation process of the data set size using the PCA tool is performed as follows: the process is defined by Equation (3) for a particular set of observations A 1 , A 2 , . . ., A n , where a vector of length m characterizes each observation, and the data set is represented by a matrix A m×n : The average observation is determined by Equation (4): whereas the average deviation is calculated by Equation (5): Thus, the Equation (6) establishes the covariance matrix for each data sample: where ϕ T i is the transpose matrix of ϕ i . The linear correspondence between two random variables is determined through the covariance. High positive covariance values show a positive correlation in the analyzed data. On the other hand, a low correlation of the data is expressed by a remarkable negative value. According to Jolliffe [33], the degree of redundancy between the variables is indicated through the absolute magnitude of covariance, which will provide the first view regarding the spread of the data set.
Furthermore, Jolliffe [33] determines the condition for dimensional data reduction using the PCA tool. In this condition, the eigenvalues and corresponding eigenvectors of the sample covariance matrix are established by the Singular Value Decomposition (SVD).
Taking into account that (λ 1 , v 1 ), · · · , (λ g , v g ), · · · , (λ m , v m ) are m pairs of eigenvalueeigenvector of the sample covariance matrix, the g eigenvectors representing the largest eigenvalues are selected, where g is the inherent dimensionality of the subspace that controls the signal, and the remaining dimensions (m − g) are those that usually correspond to noise [33]. Equation (7) defines the calculation for obtaining the dimensionality of the g subspace.
where S is the ratio of variation in the subspace concerning the total variation in the original space. It is then created a matrix U with m × n elements, where the columns refer to g eigenvectors. Subsequently, based on projecting the data onto the g-dimensional subspace w g , the data representation is determined through the principal components, as defined in Equation (8).
In this work, PCA is employed to reduce the size of the input matrix of the MLP ANNs, considering that this tool has demonstrated its importance in problems related to the pattern recognition area, as can be seen in Zhao and Wang [34] and Stief et al. [15]. The relevant elements of the data matrix are selected using the combinatorial optimization algorithm, reducing the input matrix, which also allows a reduction in the computational time of the strategy.
In this work, it was particularly defined that the PCA excludes the data components that represent less than 1% of the total variance. With the PCA tool, the classifier input matrix was reduced to a smaller number of elements (depending on the analyzed data) than the whole set of elements in the original data matrix.
Thus, there are two alternatives to be followed by the proposed methodology: (i) process the input matrix with all elements of the SMI or (ii) employ PCA in order to reduce the number of elements in the input matrix while maintaining promising levels of classification accuracies. Both alternatives are processed by the MLP ANN to obtain the experimental results. The next subsection presents the aspects related to this pattern classifier used in the identification of multiple faults, as well as in the severity analysis of bearing failures and broken rotor bars in TIMs fed by the grid.

Artificial Neural Networks
This work employs MLP ANNs to diagnose multiple faults of bearings and rotor in TIMs and determine their respective severities. The training algorithm has two basic steps: the first, called propagation, applies values to the RNA inputs and checks the response.
The value of the output layer is compared to the desired value in the output. The second step occurs in reverse, that is, from the output layer to the input. The error produced at the network outlet is used to adjust its internal parameters (weights and thresholds), as shown in [35]. The fundamental element of an ANN is the artificial neuron, also known as the processing element. Figure 1 illustrates an example of the artificial neuron. The model of the artificial neuron illustrated in Figure 1 can be mathematical described as follows: where n is the number of input signals from the neuron. X i is the i-th input signal from the neuron, w i is the weight associated with the i-th input signal. b is the threshold for each neuron. v j (k) is the weighted response of the j-th neuron concerning instant k, v j (.) is the activation function of the j-th neuron, y j (k) is the output signal of the j-th neuron concerning instant k.
Each artificial neuron can compute the input and output signals. The activation function used to calculate the output signal is typically non-linear. The ANNs that process analog data, which are involved in this application, have the activation function of the sigmoid or the hyperbolic tangent. The process of adjusting the weights of the w j network associated with the j-th output neuron is done by calculating the error signal concerning the k-iteration or the k-input vector. The following equation calculates this error signal: where d j (k) is the desired response from the j-th output neuron. Adding all the quadratic errors produced by the output neurons of the network concerning the k-th iteration, we have: where p is the number of neurons in the output. For an optimal weight setting, E(k) is minimized by adjusting the synaptic weights w ji . The weights associated with the network's output layer are recalculated using the following expression: where w ji is the weight connecting the j-th neuron of the output layer to the i-th neuron of the previous layer and η is the constant that determines the learning rate of the backpropagation algorithm. The adjustment of the weights belonging to the hidden layers is made analogously. The steps for adjusting these weights are detailed in [35]. This work employs ANNs in the TIMs multiple fault diagnosis, due to their wide use in this type of application, as demonstrated in Piedad et al. [27], Appana et al. [36], Garcia-Bracamonte et al. [37], Glowacz and Glowacz [38] and Drakaki et al. [39]. Next, the general aspects related to the experimental tests are presented, as well as the generation of the data set used in this study.

Aspects of the Experimental Architecture
In this work, an experimental data set was generated through the acquisition of current signals from two TIMs operating without failures, with excessive bearing wear and broken rotor bars. These experiments aim to simulate the unfavorable conditions inherent to these machines in an industrial process.
The reproduction of the bearing defects pursued to emulate the wear due to its excessive use, lack of lubrication, and high load on the shaft. Thus, the reproduction of these defects was carried out in three stages: (i) first, the bearings were cleaned, replacing the original lubrication with an abrasive slurry; (ii) then, the motors started to operate for the bearing degradation in the previously defined times of 15, 30, and 60 min; (iii) finally, all the abrasive slurry was removed, and the lubrication service was performed on the bearings for collecting the experimental data set.
Rotor faults were simulated by drillings in the cage, using a drill with a larger diameter than the bars' width. The implementation of drillings in the bars is according to the severity levels of the failures: 1, 2, 2-2 (diametrically opposite), and 4 broken rotor bars. Therefore, the cage's drillings are generated in sequence and against the circumference of the rotor, resulting in defects of one to four broken rotor bars.
In this work, an experimental data set was created, consisting of 450 trials, using two motors, one of 0.74 kW (Motor 1) and another of 1.48 kW (Motor 2), operating in steady-state with line-connected sinusoidal power supply. These machines were also subject to several operating conditions common to the industrial production process, such as variations in load and supply voltage levels. The coupled load torque levels in Motor 1 were varied from 2.5 N·m to 4.5 N·m, in intervals of 0.5 N·m. The voltage unbalance levels in phases A and B from 0 to +4%, with intervals of +2%, and in phase C from 0 to −4%, with intervals of −2%. As for Motor 2, the load ranges were changed from 5 N·m to 9 N·m, in intervals of 1 N·m, and the voltage unbalance levels applied were the same as adopted for Motor 1. Thus, 225 experimental tests were carried out for each of the motors, resulting in 450 data samples. These operating conditions were used to perform the tests for each fault type addressed in this work, that is, for excessive bearing wear of 15, 30, and 60 min, as well as for 1, 2, 2, and 2 (opposed), and 4 broken rotor bars. Table 1 summarizes the experimental tests performed to obtain the data set.  This workbench has independent voltage variation per phase for changing the motor supply voltages, and a direct current machine, making it possible to modify the coupled load torque on the TIM axis. Also, this structure has a torque meter with a two-fold action range with a Kistler integrated speed sensor for measuring the torque and speed signals, as well as Hall sensors, responsible for collecting and conditioning the currents signals.
These current signals are transferred to the data acquisition board's analog inputs, DAQ USB-6221 from National Instruments, which is connected to a microcomputer. Thus, the collected data are imported and analyzed in the MATLAB ® software through an algorithm developed to extract its relevant characteristics, decreasing the number of inputs to the classification system. Table 2 presents the information of the collected signals, as well as the sampling frequency for the acquisition of the signals used in this work. In the next section, it is presented the methodological details related to the alternative approach proposed in this study. Methodological Aspects for Multi-Fault Diagnosis in TIMs Figure 3 illustrates the block diagram for collecting, processing, and classifying the data set used in the proposed methodology for multiple failure diagnosis in TIMs fed by the grid.  Based on the concepts presented in Section 2.1, it was developed in the MATLAB ® software a suitable algorithm to estimate the SMIs between the motor current signals. As already described in Section 2.1, the fault characteristics of the machine current signals are extracted as a function of the displacement value τ of samples. Specifically, in this study, a displacement τ of 150 samples was used, as it was observed that the currents were in phase in at least one cycle. Thus, the algorithm performs the current Y shift in one by one sample until achieving the previously established value. In each iteration of the algorithm, the marginal and joint pdfs of the currents X and Y required for the calculation of the MI are estimated. Afterward, the estimated MI value must be stored in a data matrix. This process is carried out continuously until the displacement τ of 150 samples is achieved. Finally, the shifted mutual information of the analyzed signals is determined. Figure 4 shows the characteristic signature of SMI between the current signals of phases A and B (i a and i b ) for the condition in which Motor 1 is operating without faults, with nominal load torque and balanced supply voltage. Estimating the SMI measurements for all samples belonging to the dataset, similarities are observed in the SMI characteristic signatures, regardless of the motor operating conditions. Figures 5-8 show the MI curves analysis for the different operating situations of the machines. Figure 5a shows the characteristic curve for Motor 1 directly connected to the grid, operating without defects and subject to variations in the load torque. It is observed that the increase in the level of load conjugate also causes the increase of the maximum value of MI. Figure 5b allows observation that the variation in the positive voltage imbalance of phase A produces a shift left concerning the signature with balanced voltages, and the positive voltage imbalance in phase B combined with the negative imbalance in phase C causes a more advanced shift left. It is also important to observe the reduction in the maximum values of MI due to the increase in the level of voltage imbalance.  The analysis of variations in the load torque and the voltage imbalance can also be observed when the motor is operating with defects, either bearing failures or broken rotor bars, regardless of the level of fault severity. Figure 6 illustrates the characteristic MI signatures for Motor 1 operating with abrasive wear of 15 min on the bearings. When this machine is subject to changes in the nominal load torque, it is observed that the increase in the load torque level causes an increase in the maximum value of MI, as shown in Figure 6a. Figure 6b shows that the positive variation in the supply voltage of phase A produces a shift left in the curve and a reduction in the maximum value of MI. When voltage levels in phases B and C are varied, there is an advance concerning the SMI curve of Motor 1 with balanced voltages, in addition to the decrease in peak values of MI signatures.   An important feature is observed when there are defects in the machines. Figure 8 shows that the presence of the broken rotor bars causes a shift left in the SMI signature concerning the machine operating without failures. Still, it is observed that the bearing defects produce an increase in the maximum value of the SMI, in addition to causing an advance concerning the SMI curve of the motor operating without defects. Figure 8 shows the SMI signatures for Motor 1 with nominal load torque and balanced voltages.

Feature Extraction Stage
It is important to highlight that the analyzes described for Motor 1 can also be observed for Motor 2, such as variations in the load torque, in the voltage imbalance, and the fault occurrences.
One of the aims of this study is to investigate whether the increase in the amount of information provided to the pattern classifier results in higher classification accuracy. Thus, two different strategies are employed for the SMI tool. First, the 150 SMI attributes between the phase currents i a and i b (MI ab ) are used as a classifier input matrix. In addition to the SMI between these two currents, the 150 SMI attributes between the phase currents i b and i c (MI bc ), and the 150 SMI attributes between the phase currents i c and i a (MI ca ) are also employed in the second stage, resulting in a classifier input matrix with 450 SMI elements. After completing the SMI estimation process for all collected samples, the data is normalized to the maximum value of the mutual information obtained in all tests. Subsequently, the input vectors are generated with 150 SMI values (absolute values) for the first strategy and 450 SMI values for the second one.
In addition to investigating the use of mutual information between the currents, this study analyzes the use of PCA in the optimization of classifier input data. This approach seeks to analyze the ratio of efficiency and computational cost with the use of this tool. Thus, there are two directions to be followed by the proposed methodology, as demonstrated in Figure 3. The patterns extracted using the SMI, as well as through the PCA, are processed by the MLP ANN to achieve the experimental results. The next section presents the experimental results obtained in this work.

Experimental Results
The experimental results are obtained using the evaluation methodology by the split of the data set in training and validation subsets. This evaluation technique works by randomly splitting the data set as follows: 70% of the samples are used in the MLP ANN training, and the other 30% are employed in the validation step. This testing methodology seeks to determine the generalization level of the classifier by presenting unknown samples.
Furthermore, the Waikato Environment for Knowledge Analysis (WEKA) software was used in this study. This program allows the resolution of data mining, and pattern classification problems through the several algorithms available on its platform [40], among which are the MLP ANNs presented in Section 2.3. Besides, this platform allows the definition of the classifiers' parameters to obtain the experimental results.
As described in Section 3, this study compares the use of SMI and the PCA tool in the extraction and selection of the most relevant attributes of the ANNs input matrix, verifying the best ratio of efficiency and computational cost among the several analyzed strategies. Thus, the experimental tests are divided into three sections: (i) the first tests investigate the behavior of the methodology using the 150 SMI attributes between the phase currents A and B, i a and i b ; (ii) subsequently, it is analyzed the performance when the 450 SMI elements among all phase currents are employed, 150 SMI attributes between i a and i b , 150 SMI attributes between i b and i c and 150 SMI attributes between i c and i a ; and (iii) finally, it is evaluated the use of PCA in reducing the input matrix of the classifiers.
As previously described in Section 3, this work presents a double-stage approach: in the first stage, it is carried out the multi-fault classification of bearings and broken rotor bars, and in the second one, it is analyzed the severities of these failures in TIMs fed directly by the grid and subject to different operating conditions. Still, it should be noted that the number of neurons in the hidden layer were defined based on the best results of the several tests previously carried out for each of the cases addressed in this work. Regarding the classifier parameters, a training learning rate of 0.3, a momentum term of 0.2, and a maximum number of 500 epochs for the training were selected, and the logistic and linear activation functions for the hidden and output layers were used, respectively. These parameters are employed in the standard configuration of the MLP ANN available in the WEKA software.

Experimental Results Using the 150 SMI Attributes between i a and i b
In this section, the experimental results of the tests are presented in which the 150 attributes of the SMI signature between the phase currents i a and i b are used as the input matrix of the MLP ANNs. Again, it should be noted that the TIMs were subject to different operating conditions such as load variations from 60% to 120% of the nominal load torque and voltage unbalance between phases A, B, and C of up to 4%, as well as bearing (BF) and rotor (RF) faults, as described in Section 3. As this paper presents a two-stage methodology, which performs the multiple defects identification and severity analysis, the results are presented separately. For the multi-fault diagnosis, the MLP ANN has 25 neurons in the hidden layer. Table 3 shows the results for these trials carried out on both motors. Through Table 3, it is possible to observe that using the 150 SMI attributes, the system shows promising results in the multi-fault diagnosis in TIMs fed directly by the grid. When the voltage unbalance is up to 0.5%, all samples of both motors are correctly classified. For the Motor 1 data set, with an increase in voltage unbalance of up to 4%, the methodology correctly identifies 94.1% of global samples, 84.2% of healthy samples, 94.9% of bearing faulty samples, and 100% of faulty rotor samples. Also, the Kappa index of 0.94 confirms the results described above, as presented in [41].
Analyzing the results presented for Motor 2, the increase in the voltage unbalance level is related to a decrease in the correct classification accuracies, according to Table 3. With a voltage unbalance of up to 4%, 95.3% of the global samples are adequately identified. The Kappa coefficient of 0.93 shows the total agreement with the results obtained.
After identifying the multiple failures present in the machines, their severities are analyzed. This new phase of the proposed system aims to investigate the efficiency in the separation among faulty classes, aiding the machine operator in a better schedule of maintenance service. Thus, the other two MLP ANNs are used to analyze the severities of bearing and rotor faults. As described in Section 3, three severity levels of bearing defects are analyzed, seeking to simulate the bearing wear due to excessive use of 15 min (15 min), 30 min (30 min), and 60 min (60 min). For this purpose, the MLP RNA used is composed of 10 neurons in its hidden layer. In Table 4, the results for the severity analysis of bearing failures are shown, when the classifier input matrix employs the 150 SMI attributes between the phase currents i a and i b . When the voltage unbalances are up to 2% for Motor 1, the methodology can identify all severity levels of bearing failures in TIMs, as shown in Table 4. All samples with excessive wear of 15, 30, and 60 min are correctly classified. Furthermore, it should be noted that the bearing wear of 15 min is considered an incipient defect, which confirms the promising performance of the system. Also, for Motor 1, when the voltage unbalance level increases to 4%, it is observed that the technique correctly identifies 96.0% of all samples. A correct diagnosis is promptly performed in the fault progression. The Kappa index of 0.94 confirms these results.
In the experimental results of the Motor 2 shown in Table 4, it is possible to verify that for a voltage unbalance of up to 0.5%, 100% of the samples are correctly identified. Even with the increase in voltage unbalance up to 4%, the system achieves reasonable classification accuracies, diagnosing 95.8% of all samples. Again, the methodology inadequately identifies only samples with bearing wear of 15 min. As the defect evolves, the system immediately identifies the problem.
In addition to analyzing the bearing fault severities, this paper aims to evaluate the broken rotor bar severities in TIMs. Four severity levels are analyzed: 1 (1 BRB), 2 (2 BRB), 4 (4 BRB), and 2-2 (2-2 BRB) broken rotor bars. These new tests are carried out using an MLP ANN with 10 neurons in the hidden layer. Table 5 presents the experimental results for the severity analysis of broken rotor bars. In the experimental tests in which both motors are operating under normal conditions (voltage unbalance up to 0.5%), it is noted that the system readily identifies the severity of the failure progression, according to Table 5. When Motor 1 is subject to a voltage unbalance of up to 2%, 94.4% of all samples are correctly identified. In this operating condition, a classification accuracy of 80.0% is achieved for a broken rotor bar diagnosis, which is considered an incipient fault. In the defect progression, in which there are 2 broken rotor bars sequentially, 100% of these samples are detected. The Kappa coefficient of 0.92 for this situation confirms the results obtained for Motor 1.
In the worst operating condition for Motor 2, 92.4% of the total data set is adequately diagnosed, as shown in Table 5. In this case, the classification accuracy of 81.8%, 90.9%, and 100% are presented for samples with 1 BRB, 2 BRB, and other broken rotor bar severities. The Kappa index of 0.90 shows a total agreement with these results. Again, it is analyzed that with the fault evolution, the methodology can identify the real motor operating condition.
The experimental results showed in this section prove the proposed system's capacity based on SMI between the phase currents i a and i b and ANNs in the monitoring and diagnosis of the multiple failure severities in TIMs fed by the grid subject to several operating conditions. The next section evaluates the system performance employing the SMIs between all phase currents, investigating whether the use of a more significant amount of information improves the efficiency of the fault diagnosis system.  (i a and i b , i b and i c , and i c and i a ) were used as input matrix of the classifiers. It is important to emphasize that TIMs' operating conditions are similar to those presented in Section 4.1. Initially, the system performance for the multi-fault diagnosis is investigated. In this test, an MLP ANN with 25 neurons in the hidden layer is used. Table 6 presents the results for the multiple failures detection for Motor 1 and Motor 2. In Table 6, it can be seen that the increase in the amount of information, leads to better results when compared to the tests shown in the previous section. Analyzing the Motor 1 data set, when there is a voltage unbalance of up to 2%, an unfavorable machine operating condition, the methodology correctly identifies all samples. By increasing this voltage unbalance up to 4%, the accuracy decreases slightly. However, 97.3% classification rate is still obtained. It is noted that 100% of healthy samples are identified, better results than in Table 3. Moreover, Table 6 shows classification accuracies of 97.0% and 96.0% of samples with bearing faults and broken rotor bars, respectively. The Kappa index of 0.96 confirms these results obtained in the multi-fault diagnosis.

Experimental Results
For the Motor 2 data set, it is observed that 97.3% of the global samples are correctly classified even in the worst operating condition. Nevertheless, in this case, 94.1%, 97.0%, and 100% of healthy, bearing wear and broken rotor bar samples are correctly identified, respectively. The Kappa coefficient of 0.96 confirms these promising results.
Further, the bearing wear and broken rotor bar severities are analyzed. The analyzed severity levels are following the tests performed in the previous section, that is, bearing wear due to excessive use of 15, 30, and 60 min, and 1, 2, 4, and 2-2 broken rotor bars. Both MLP ANNs that diagnosis the severity of these defects have 10 neurons in the intermediate layer. Tables 7 and 8 show the experimental results obtained in the severity analysis of bearing and rotor failures, respectively. Once again, the increase in the amount of information provided to the classifiers results in a substantial improvement in the diagnostic system's performance, as shown in Tables 7 and 8. The bearing failure severity reaches classification accuracy of 100% for all samples of Motors 1 and 2, regardless of the machines' operating condition. Bearing wear due to excessive use of 15 min, which is considered an incipient defect, is detected immediately, even for voltage unbalances of up to 4%.
The proposed method's classification results for the severity diagnosis of broken rotor bars are presented in Table 8. Voltage unbalance tests of up to 2% have 100% of the total samples of Motor 1 correctly identified. Increasing the level of voltage unbalance up to 4% shows an accuracy of 96.7% of the global samples. In this same operating condition, it is noted that with the evolution of the defect to 2 broken rotor bars, the method already detects the fault. The Kappa index of 0.96 validates these achieved results.
The experimental tests related to Motor 2, verified under normal operating conditions (voltage unbalance of up to 0.5%), show all broken rotor bar samples correctly identified, as shown in Table 8. With the worsening of the operating condition, the classification accuracies tend to decrease. Accuracy of 96.7% of global samples is achieved with a voltage unbalance of up to 4%. Only samples with 1 BRB are identified inappropriately. As the fault progresses, the monitoring system will promptly identify it.
In this section, it was observed that the increase in the amount of information provided to the pattern classifiers, through the use of the shifted mutual information among all phase currents, assists in improving the performance of the diagnostic system. The computational time for the build of the classifier model is longer compared to using only the SMI between the phase currents i a and i b . In the next section, the system performance is evaluated when the PCA tool is used to reduce the input data matrix of the MLP ANN and, consequently, decrease the computational time.

Experimental Results Using the PCA Technique
In this section, the principal component analysis (PCA) is employed to optimize the elements of ANN inputs. The correlation technique and a variance ratio of 95% are selected to retain the principal components. In the results previously described, it was verified that using the SMI among the three currents demonstrated superior performance to the SMI between the currents i a and i b . Thus, PCA is employed in the composition of the 450 attributes of the SMI signature among the three-phase currents, resulting in new attributes representing the input matrices of the MLP classifiers.
For the Motor 1 data set, the classifier input matrix that performs the multiple failure identification was decreased from 450 to 20 elements. As for the data set related to Motor 2, the reduction in the ANN input matrix attributes was 450 to 28 elements. In both cases, an MLP ANN with 5 neurons in its hidden layer was used. Table 9 presents the results of the multi-fault classification using the PCA-ANN method.  Table 9 shows that the classification accuracies are reduced, mainly for the voltage unbalance of up to 4%, when compared with the results presented in Table 6. However, it should be noted that under usual operating conditions (voltage unbalance up to 0.5%) the samples are classified correctly. Also, for voltage unbalances up to 2%, the system performance remains satisfactory, where classification accuracies of 95.6% and 98.0% are obtained for the global sample set of motors 1 and 2, respectively. The Kappa coefficients of 0.93 and 0.97 represent a total agreement with the results obtained in both cases.
Subsequently, the severity diagnosis of the defects present in the machines is investigated. For the bearing fault identification in the Motor 1 data set, the input data matrix of the MLP classifier was reduced from 450 SMI attributes to 25 new elements. Concerning the Motor 2 data set, the reduction was from 450 to 20 attributes. The MLP ANNs used in the severity diagnosis of bearing failures have 5 neurons in the intermediate layer. Table 10 illustrates the experimental results for these new tests. The results of Table 10 verify that the system is able to efficiently diagnose the different severity levels of bearing faults, from an incipient defect to the most severe. Even for voltage unbalance of up 2%, the methodology identifies 100% of the total sample set from the Motors 1 and 2. Thereby, these results confirm that using the PCA to optimize the classifier input matrix presents advantages over the only use of SMI.
In the case of broken bars, the input matrices of the MLP ANNs were optimized from 450 SMI elements among the three-phase currents to 11 and 25 new attributes for the data sets of Motors 1 and 2, respectively. The MLP ANNs that carry out the severity diagnosis of rotor failures have 5 neurons in the hidden layer. Table 11 shows the experimental results for the severity analysis of rotor faults.
The system presents satisfactory results, as shown in Table 11. For an unbalance in the supply voltage of up to 2%, classification rates of 95.0% were achieved for all samples. Still, it is observed that with the fault progression, the system diagnoses the problem in a more assertive way, identifying 100% of the samples with 4 broken rotor bars. By increasing the voltage unbalance up to 4%, the proposed methodology presents promising results. The Kappa coefficient of 0.91 confirms the total agreement with the results obtained.
Analyzing the results related to Motor 2 shown in Table 11, it is observed that for voltage unbalance of up to 4%, 90.0% of the global samples are appropriately identified. Similar to Motor 1, with the failure evolution, the approach can diagnose the real situation of the machine. The Kappa indices over 0.87 confirm the results presented.

Experimental Results' Discussion
The results presented in Sections 4.1-4.3 reveal the promising performance of the proposed system in monitoring and diagnosing the severities of bearing faults and broken rotor bars in TIMs fed by the grid and subject to several operating conditions.
Also, the experimental results demonstrated that using a more significant amount of information with the use of the SMI among all currents of TIMs provides better performance of the system. In the multiple failure diagnosis, the increase was 2%, resulting in classification rates greater than 97.3%. In the severity analysis of faults, this increase was 4%, where more than 96.7% of all samples were correctly identified.
Using the PCA technique to reduce the classifiers' input matrix, a satisfactory relation between efficiency and the computational cost was observed. Also, the input matrices were reduced from 450 to a maximum of 28 elements, depending on the motor used, thus reducing the computational time for the classifier model's build by 95.0%. Classification rates above 90.7% demonstrate the suitable performance of this methodology. Thus, it is concluded that the use of PCA to reduce the dimensionality of the input matrices of the MLP ANNs provides promising results, allowing a reduction in computational time, and facilitating the efficient use of this method in embedded systems. In the next section, it is presented a comparison of the proposed approach with other research. Table 12 compares the present work with other approaches employed in the multiple fault diagnosis in TIMs. In Stief et al. [15], Jiang et al. [19], Abid et al. [21], Piedad et al. [27], and in this present work, the machines were directly connected from the grid. In the studies of Juez-Gil et al. [16] and Wang et al. [23], the motors were powered by frequency inverters. Additionally, in Ali et al. [3] and Ali et al. [9] research, both sinusoidal and non-sinusoidal power supplies were employed.

Comparison with Recent Studies
Regarding the proposed approaches for characteristic extraction of the faulty signals, Ali et al. [3], Ali et al. [9], Wang et al. [23], and Piedad et al. [27] analyzed the collected samples in the frequency domain using OMP, Discrete WT, MRSFN, and FOPs. In this present work and the studies of Abid et al. [21] and Jiang et al. [19], the collected signals were analyzed in the time domain using the MI among the current signals of the TIMs and by statistical measures, respectively. On the other hand, Stief et al. [15] extracted the characteristics in both time and frequency domains through statistical measures and spectral analyzes of the signals.
By analyzing the motor operating conditions, it was observed that all studies adopted variations in the load torque levels. Also, in Jiang et al. [19] and in this work, it was changed the voltage unbalance levels in the machines' power supply. In the manuscripts in which the motors were fed by frequency inverters, Ali et al. [9], Juez-Gil et al. [16], and Wang et al. [23] analyzed the approach performance for several frequency levels. Finally, Abid et al. [21], Stief et al. [15], and Piedad et al. [27] did not mention variations in supply voltages.
The experimental results show the promising performance of the methodologies in the multiple failure diagnosis in TIMs subject to several operating conditions, such as variations in the mechanical load torque and the supply voltage levels. It is essential to highlight that 3 different strategies were used to identify multiple defects in the present work. Multi-failure classification and the severity analyses reach a significant accuracy rate greater than 96% when the highest amount of the information is used (SMI among the signals of the currents of the phases A, B, and C). The PCA technique was adopted to simplify the classifier input matrix and consequently reduce the computational time.
In the new experimental tests, the satisfactory performance of the approach was verified, correctly identifying more than 90% of all data set, as shown in Table 12.

Conclusions
This study presents a multi-stage alternative methodology for fault classification in TIMs. In the first stage, the system performs the failure identification of bearings and broken rotor bars. In the second stage, the system diagnoses the severity of these defects. In this context, bearing wears due to excessive use of 15, 30, and 60 min and rotor faults with severities of 1, 2, 2-2, and 4 broken bars are analyzed, considering voltage unbalances of up to 4% and a wide load range (60-120% of the nominal load torque).
Furthermore, this work evaluates the use of three strategies for the fault characteristic extraction of signals of the TIMs. Initially, only the SMI signature between the phase currents A and B is used. Subsequently, the SMI among all motor currents is used, i a , i b and i c , to investigate whether the increase in the amount of information available to classifiers assists in the fault diagnosis. Finally, it is analyzed whether the use of the PCA technique to reduce the input data matrix of MLP ANN results in a satisfactory ratio of efficiency and computational cost. This paper achieved satisfactory experimental results regardless of the strategy employed for the fault feature extraction. Future works perspectives aim to investigate the use of the methodology in the multiple fault diagnosis in TIMs fed by frequency inverters.

Conflicts of Interest:
The authors declare no conflict of interest.