A Novel Fault Detection and Identification Framework for Rotating Machinery Using Residual Current Spectrum

A novel framework of model-based fault detection and identification (MFDI) for induction motor (IM)-driven rotating machinery (RM) is proposed in this study. A data-driven subspace identification (SID) algorithm is employed to obtain the IM state-space model from the voltage and current signals in a quasi-steady-state condition. This study aims to improve the frequency–domain fault detection and identification (FDI) by replacing the current signal with a residual signal where a thresholding method is applied to the residual signal. Through the residual spectrum and threshold comparison, a binary decision is made to find fault signatures in the spectrum. The statistical Q-function is used to generate the fault frequency band to distinguish between the fault signature and the noise signature. The experiment in this study is performed on a wastewater pump in an existing industrial facility to verify the proposed FDI. Two faulty conditions with mathematically known and mathematically unknown faulty signatures are experimented with and diagnosed. The study results present that the residual spectrum demonstrated to be more sensitive to fault signatures compare to the current spectrum. The proposed FDI has successfully shown to identify the fault signatures even for the mathematically unknown faulty signatures.


Introduction
Primarily, the industrial RM utilizes an IM as the actuator, such as a pump, fan, conveyor, or compressor. In the factory, IM-driven RM often run not in a proper condition in which their efficiency may reduce, and the energy cost may increase. If this condition stays for a more extended period, it will lead to machine failure and may stop the entire production process with the risks of safety, expensive production loss, a time-consuming, and costly repairing process. As the industrial systems become more complex and expensive, the demand to improve the reliability and safety of the systems subjected and exposed to faults and failures such as the IM-driven RM is increasing. The development of IM-driven RM condition monitoring and FDI has a promising contribution in reducing the maintenance cost and extending the operation time.
Popular and promising FDI strategies in recent years include motor current signature analysis (MCSA) and vibration analysis [1][2][3]. Both methods examine the abnormal harmonic modulation at the specific fault characteristic frequencies in the frequency spectrum by utilizing the Fast Fourier Transform (FFT) algorithm. Nevertheless, the current signal provides better information than the vibration signal under different operation conditions [4]. The availability of a low-cost step-down current transformer sensor and a high sampling-frequency resolution data-acquisition (DAQ) card also provides cheap and reliable measurement of the IM current signal in industrial plants. The current signal can be 1.
Requires prior knowledge regarding the specific operation conditions [13]; 3.
Can be time-consuming, nontrivial, and inaccurate due to the dynamic assumptions, and some of the model uncertainties are not quantifiable [15].
Thus, to overcome these, a data-driven model identification approach is considered to obtain the IM model that describes the actual condition of the IM. The data-driven model identification will be accomplished by applying the black-box identification based on SID.
Since it is data-driven, the IM model will be identified based on the measured inputoutput voltage and current data at a specific particular time. Therefore, the IM model is considered as a linear quasi-steady-state discrete state-space model. SID algorithm has been applied in the model identification for MFDI method in different cases and provides robust residual generator [6,8,[16][17][18]. SID algorithm estimates the state sequences directly from the measured input-output data by using the orthogonal and the oblique projections of some particular data block Hankel matrix row spaces into another particular data block Hankel matrix row spaces. The projections are executed after the QR decomposition is applied to the block Hankel matrices. Then, the Singular Value Decomposition (SVD) is used to determine the system order, the observability matrix, and the state sequences [19]. Unlike the derivation of the first principle, SID-based model identification provides a general, and straightforward parameterization of a multivariable system [20]. However, the model parameters do not have physical meanings. Thus, changes and variations in these parameters cannot be used to understand the cause-effect in FDI. Although the parameter's physical meaning may be lost, this identified model provides an adequately accurate dynamic representation of the IM behavior at a specific time as it is not derived under any dynamic assumptions. The significant advantage of the SID algorithm is non-iterative with no nonlinear optimization involved so that there is a guarantee of convergence of the objective criterion [21]. This advantage is well-suited for industrial practices as it does not require a long computational time.
FDI is a challenging issue, particularly in a low signal-to-noise ratio (SNR) condition such as in an actual industrial plant. A frequency-domain diagnosis framework must be able to provide reliable performance in evaluating the faulty condition. The proposed MFDI provides novelties to overcome those particular challenges. It employs a datadriven IM model identification for fast computational time and accurate input voltage and outputs current relationship description. The IM model filters the dominant input voltage harmonics and noises from the output current spectrum and presents it as the residual spectrum. A threshold as a benchmark for evaluating the residual spectrum is generated to identify the fault signatures.
The general framework of the proposed MFDI is shown in Figure 1. The experiment in this study has been conducted in an industrial wastewater centrifugal pump driven by an IM with a flexible coupling transmission. Five sections are presented in this paper. Section 1 explains the background of study and motivation. Section 2 provides the motor current signal and its frequency spectrum pattern. Section 3 describes the data-driven-based model identification, while Section 4 provides details of the FDI algorithm. Section 5 presents the experimental setups. Section 6 presents the experimental results and their discussions, and Section 7 the conclusions and the future works of this study are presented.

Motor Current Signal Description
The motor current signal reflects the electrical relationship between the stator and the rotor, where helpful information regarding the IM condition can be observed in its frequency spectrum. For instance, the presence of electrical faults such as stator short circuits and broken rotor bar; and mechanical faults such as bearing faults, misalignment, air-gap eccentricity, and mechanical looseness induces additional frequency components in the motor current stator signal [22][23][24][25][26]. Besides, the inherent defect in the IM design or construction and the driven-systems variation may induce multiple harmonics into the motor current signal. While in actual industrial operations, the motor current frequency spectrum is vulnerable to background noise distortion and voltage harmonics interference. Therefore, the theoretical motor current signal in faulty condition can be decomposed as follows, where n(t) ∼ N (0, σ 2 ) denotes the Gaussian white noise coming from the driving voltage signal and measurement noises, f i is the frequency component, A i is the amplitude, and φ i is the initial phase. It is important to note that in Equation (1), the motor current harmonic and fault characteristic frequency are not described separately as in some cases, the fault characteristic frequency overlaps the harmonics [27,28].
The main idea of MFDI is to generate an output estimate of a system and obtain the residual output. The motor current signal is estimated using the motor model obtained from the input-output system identification in this study. Theoretically, while keeping the voltage harmonic and driving voltage noise, the estimated motor current signal is free from fault and measurement noise components. Therefore, subtracting the estimated current signal from the measured current signal can exclusively isolate the residual harmonic in which its magnitude is much smaller than in the current harmonic, fault frequency components, and measurement noise components.
whereÎ(t) is the estimated current signal, r(t) is the residual current signal, r i is the residual signal amplitudes, f r i is the residual frequency, and i is an integer value correspond to the residual frequency index with K indexes length. Now, the so-called residual frequency f r i corresponds to residual harmonic, fault, and measurement noise frequency components where their presence may overlap each other. The frequency spectrum of Equations (1) and (3) can be obtained by applying the FFT algorithm where it is based on this following Discrete Fourier Transform (DFT) [29], where k = 0, 1, 2, 3, . . . , N − 1, and N is the length of data.

Fault Characteristic Frequency
The development of MCSA recognizes the additional frequency components induced by the faulty conditions as the fault characteristic frequencies, as they are described in Equation (1). The proposed FDI method isolates fault characteristic frequencies as a frequency pattern or band in which it is treated as a subset frequency for detection and identification evaluation. Generally, a fault characteristic frequency presence in the frequency spectrum is modulated, and it generates several fault harmonic frequencies. Thus, grouping these frequencies as a subset frequency band and examining its pattern will focus on this proposed method. The fault characteristic frequencies pattern is presented as, where f f k is the k-th fault characteristic frequencies that induced into the current by the faulty conditions. Misalignment is one of the most common faults present in a factory. It occurs in the transmission shaft between the motor and the driven system. The present misalignment in the industrial equipment may disrupt the transmission of power from the motor to the driven systems. It generates excessive vibration, and eventually, it leads to the driven system, coupling, and motor bearing failures. Thus, correcting the misalignment can prevent the industrial equipment from further failure.
Misalignment induces dynamic eccentricity in the motor. Dynamic eccentricity happens because the rotor center is mislocated from the center of rotation, and when the rotor rotates, the minor air-gap position rotates. Hence, the stator angle is affected by the air-gap in dynamic eccentricity as both the rotor and the minor air-gap position rotating [30]. The rotor position variation and oscillation in the length of the radial air-gap will induce the air-gap flux density variation into the stator current signal. Therefore, it generates two symmetrical side-bands around the IM operational frequency f e and its harmonic that appeared as the misalignment/unbalance fault signature, where m = 1, 3, 5, . . . and f r is the shaft rotational frequency. In this study, an angular misalignment experiment will be performed. The angular misalignment occurs when the centerline of the shaft connection is crooked and set an angle between the driven system and the motor. In particular, it damages both the motor and the driven system. Besides misalignment as the common faulty condition, the other considerably faulty condition in which its fault signatures have never been isolated and mathematically expressed before will also be studied. A turbulent flow is considered a condition that frequently occurred in a pumping system. The turbulent flow occurs due to the pumping system running below the pump flow rate capacity. In some cases, the motor running at a speed below the minimum speed of the pump may cause the turbulent flow as well.
Similar to misalignment, the flow turbulence causes the centrifugal pump to vibrate. The vibrations travel through the shaft and couplings to the IM. These vibrations may interrupt the rotor rotation and induce air-gap fluctuations to the rotor-stator relationship. Thus, the flow turbulence may induce additional frequency disturbances to the stator current signal. There are no research reports found for MCSA in pump flow turbulence detection. Nonetheless, there is research done in centrifugal pump cavitation fault [31]. Since turbulence flow in along period leads to cavitation fault, the fault diagnosis approach presented in [31] will be referred. In contrast with angular misalignment, the turbulent flow has a no-fault signature modeled with MCSA. However, as the stator current signal is disturbed by the pump vibration, the presence of turbulent flow can be observed from the stator current signal, the same concept as misalignment. The presence of flow turbulence in a pump is not a fault. Nevertheless, it is a fault in the making.
Both misalignment and pump turbulent flow in a long period will damage both the driven equipment and the IM. The proposed FDI framework is expected to monitor the IM condition and the driven equipment condition. Thus, the proposed FDI framework employs the IM as a sensor to monitor the whole system of RM conditions and detect the fault presence.

Subspace Identification
Consider the black-box identification, an IM in a quasi-steady-state condition can be represented by using a discrete-linear time-invariant state-space model. The discrete state-space describes the relationship between input, output, noise, and disturbance using a first-order difference equation system and auxiliary state vector x k as follows, where y k ∈ R l×1 and u k ∈ R m×1 are the measured output and input vectors, x k ∈ R n×1 is the state vectors, w k ∈ R n×1 and v k ∈ R l×1 are the unobservable disturbance and measurement noise vectors. A ∈ R n×n , B ∈ R n×m , C ∈ R l×n , and D ∈ R l×m are the state, input-to-state, state-to-output, and feedthrough matrices. While Q ∈ R n×n , S ∈ R n×l , and R ∈ R l×l the covariance matrices of Gaussian distributed zero-mean white noise w k and v k . For the system to be able to be observed in the output y k and identified, the {A, C} is assumed to be observable. At the same time, the {A, B Q 1/2 } is assumed to be controllable so that all the system modes can be excited by deterministic input u k and/or by stochastic input w k . The input u k of the system is assumed to be persistently excited [21]. The SID algorithm interpretation and derivation lie on the state sequences and the extended observability matrix. The formulation of the state sequences and the observability matrix can be done by extending the state-space model in Equation (7) with the input and output data [32], where Γ i ∈ R li×n is the observability matrix defined as, with the subscript "d" and "s" denote deterministic and stochastic meaning, the lower block triangular Toeplitz matrices H d i ∈ R li×mi and H s i ∈ R li×mi are defined as, and the state sequences X i ∈ R n×j is defined as, In the SID algorithm, the input and output data have to be represented in block Hankel matrices. The length of block rows or horizon i is defined by the user and must be large enough compared to the maximum order n. The column of Hankel matrices j is set to be j = s − 2i + 1 with s is the length of available data, and it is assumed that j, s → ∞. The input block Hankel matrix U 0|2i−1 ∈ R 2mi×j is defined as, where Similarly, the output block Hankel matrix Y 0|2i−1 ∈ R 2li×j can be defines as Equation (10).
In order to understand more regarding the system state sequences, several theories are explained and proven in [33]. It is denoted according to Theorem 11 in [33] that, The projected matrix Z i is considered the optimal estimation of future output Y f based on the future input U f , past input U p , and past output Y p data without knowing the states of the system. By shifting the past and future border by one block rows, the projected matrix Z i+1 can be found similarly as, Hence, from Equations (11) and (12), it can be determined that, where † denotes the Moore-Penrose pseudoinverse of the matrix . First, to calculate the matrix Γ i , the future output Y f row space is projected along with the future input U f row space into the joint of the row space between past input U p and past output Y p . This projection is an oblique projection that yields to, In which an SVD method can be applied to decompose the matrix O i into, where W 1 and W 2 are the user-defined weights that can be chosen as presented in [32]. The matrix Γ i then can be calculated by, where the matrix Γ i−1 can be obtained by not including the last l rows of the matrix Γ i . According to [33], it is explained and proven that the corresponding columns ofX i andX i+1 are the non-steady-state Kalman-filter state estimation at two consecutive time instants, therefore Equation (7) can be rewritten as, where the ρ w and ρ v are the disturbance and noise matrices that their row spaces are orthogonal to the row spaces of (13) and (14) into The solution of Equation (19) can be found using the least-square method such that the matrices A, C, and K can be identified. According to (20), B and D appear to be linear with K, so it can be said that K(B, D) is a linear matrix function of B and D. Hence, a solution of K(B, D) can be found by using the least-square method to minimize, Suppose the matrices A and C are known by solving Equation (19) and the covariance matrix of the disturbances and the noises are known from Equation (8), the state covariance matrix can be found by solving a discrete Lyapunov equation, By defining the output covariance matrix, it can be found that, Then, it can be defined that, Finally, the Kalman filter gain can be calculated by, where P can be found by solving the algebraic Riccati equation,

Residual Current Generation
Suppose an IM is described in the stationary frame, the stator current i sα and i sβ , and the supply voltage v sα and v sβ are converted from the measurable three-phase reference ABC frame voltage v a , v b , v c and current i a , i b , i c by using Clarke's Transform as follows, where θ = 0, and it applies to voltage and current as follows, where v s0 and i s0 are equal to zero. First, using the black-box identification presented as Algorithm 1, the IM discrete-state space model is obtained. Then by injecting the supply voltage v sα and v sβ into the identified state-space model, the estimated stator currentî sα andî sβ are generated.
Algorithm 1 IM state-space identification 1: input: horizon i; voltage v sα and v sβ ; current i sα and i sβ ; and weight W 1 and W 2 2: output: discrete state-space A, B, C, and D 3: procedure SUBSPACEID(i, v sα , v sβ , i sα , i sβ , W 1 , W 2 ) 4: initialize U 0|2i−1 and Y 0|2i−1 Equation (10) 5: calculate O i Equation (15) 6: calculate Z i and Z i+1 Equations (11) and (12) 7: calculate Γ i and Γ i−1 Equation (17) 10: do Least-Square problem to obtain A, C, and K Equation (19) 11: do Least-Square problem to obtain B and D Equation (21) 12: end procedure In order to execute the IM FDI, the residual current signal must be generated. The residual current signal is defined by subtracting the estimated stator currentî sα andî sβ from the stator current i sα and i sβ then convert it back into the three-phase ABC frame by using inverse Clarke's Transform as the following expression, where r a , r b , and r c are the three-phase ABC frame residual current signal. The standard error estimate (SEE) is used to evaluate the fitness of the estimated current signal as follows, where r is the residual signal and i is the measured current signal. Both are expressed in the three-phase ABC frame.
One of the requirements to implement Algorithm 1, a horizon i must be selected accordingly. Ref. [33] states that horizon i must be selected as i > n, with n as the statespace order. As the IM is assumed to be in a quasi-steady-state with a constant speed, the IM is described to be a fourth-order system (n = 4) in this study. Several iterations to select an appropriate horizon i were performed. The selection categories are set to be computational time to perform the identification and the level of SEE. As it is presented in Table 1, the computational time varies a lot with tiny variations in the SEE level. Selecting i = 32 will have almost 0.5% better SEE than i = 16. Nevertheless, it has a trade-off of 0.1 seconds computational time. Selecting i = 8 will increase the SEE level. Since the amount of data for the analysis is quite large, and considering computational time and acceptable SEE level, the horizon was selected to be i = 16. Another reason is that the residual spectrum of these various horizons does not show much difference, as shown in Figure 2.

Residual Current Spectrum Threshold
Consider measurement of voltage and current signals from an IM, where the current signal is described in Equation (1). By employing Algorithm 1, a model of this IM is identified from the measured voltage and current signals. The measured current is then subtracted by an estimated current generated from the identified model with the measured voltage as the input. The SEE is calculated via Equation (31) where the result is equal to 1.74%. Figure 3 shows the measured, estimated, and residual current of the IM in threephase reference. The subtraction result is the residual current signal in which Equation (4) can be applied to obtain the residual spectrum. Figure 4 shows the measured, estimated, and residual current spectrums of a single-phase. It can be seen that the estimated spectrum is cleaner than the measured spectrum. At the same time, the residual spectrum contains the noise and harmonic residues as the result of subtraction. It is worth noting that sidebands can be noticed in the residual spectrum pointed by red arrows where it does not appear in the measured spectrum. The sidebands themselves are associated with misalignment signatures in Equation (6). There is likely an incipient misalignment fault presence in the IM. It appears that the potential advantage of the residual spectrum is more sensitive to fault signatures compare to the current spectrum.  An MCSA evaluation to determine an IM acceptable condition under a faulty condition has never been done because the fault signature magnitude can differ according to its operating condition. As the residual spectrum behaves more sensitively to the fault signatures, an evaluation must be performed to determine an acceptable condition of IM with fault signatures on its residual spectrum. In an actual industrial application, obtaining a dataset of an IM with perfect healthy conditions is impossible. Hence, this residual threshold can provide a solution in the residual spectrum evaluation, especially in an actual industrial application. The threshold is determined by a dataset of the relatively healthy residual spectrum. The so-called relatively healthy residual spectrum has a description as in Equation (3) but without faulty component or with faulty component but in acceptable low amplitude.
A dataset of n voltage and current pair from an IM is considered in healthy conditions and at the same loading is collected. By using Algorithm 1 and from this dataset, n IM models are identified. The n residual current signals are generated from the dataset by using these models. Through Equation (4), n single-phase residual signals are converted into n residual spectrums. Therefore, each residual frequency f r i has a set of n magnitude values that is stored in matrix R i .
where R i n is the residual spectrum magnitude values of residual frequency f r i at integer index i-th from the n-th measurement, and R i ∈ R 1×n is the matrix where the dataset of n residual spectrum magnitude values is stored. The maximum magnitude value of frequency f r i stored in R i can be defined as, while the mean and the standard deviation of these magnitude values can be calculated as, whereR i is the f r i magnitude value mean and R i σ is the f r i magnitude value standard deviation. Based on these statistical evaluations, a threshold value for each frequency f r i is interpreted as, where R i TH is the residual spectrum threshold at frequency f r i , and w 1 + w 2 = 1 with w 1 ≥ w 2 .
In an actual application, a frequency offset may occur. For example an IM is operated at f e = 60 Hz with rotor frequency at f r = 28.5 Hz. When the IM under faulty condition, according to Equation (6), the fault signature f e − 1 × f r appears in the residual spectrum at f r i = 31.5 Hz. However, this signature will not always occur at 31.5 Hz, especially when n measurements are performed. In most data, the signature will occur at 31.5 Hz.
In some data, it may occur at 31.2 Hz, and the other data, it may occur at 31.7 Hz. If one residual data has 130,000 data points and is obtained with a 10 kHz sampling rate, the maximum frequency for the residual spectrum is 5000 Hz with 65,000 index integers i. After a calculation, the integer of index i for the signature of f r i = 31.5 Hz is obtained as i = 410. At the same time, for the other measurements, the signature could be obtained at integer i = 406 and i = 412. This phenomenon is called the frequency offset. To obtain a clearer understanding of this phenomenon, Figure 5 is presented. This figure presents a comparison of two different residual spectrum data under the same angular misalignment condition in the same period of measurements. Figure 5a shows the offset of fault signature f e + 1 × f r , and Figure 5b shows the the offset fault characteristic frequency of fault signature f e + 5 × f r . Even, it can be seen that in Figure 5a for residual spectrum 2, the signature peak produces 2 integer index i with relatively similar magnitude. The offset distance of two different fault signatures is also different. The fault signature f e + 5 × f r has higher distance compare to fault signature f e + 1 × f r . Figure 5 shows that the frequency offset can cause a bias in the significant peaks detection. Hence, the term R i max in Equation (36) is used to compensate for the frequency magnitude that occurred in an offset frequency index. Meanwhile the termR i + 3 × R i σ is used to determine the threshold level of confidence, a threshold value can be chosen by tuning the weight parameters w 1 and w 2 . The procedure to determine the healthy spectrum threshold can be seen in Figure 6.
The n identified models would be saved as the healthy model references to generate the estimated current signals. The n models are categorized by their normalized gain (current/voltage) and power vector. For example, observation data will select a model with similar gain and power factors to generate the residual signals, as shown in Figure 1. The model selection for observation data is accomplished by using the Euclidian distance calculation.

Binary Pattern of Fault Signature
According to Figure 1, the FDI process is started when the residual current spectrum of observation data is compared to the residual spectrum threshold that has been obtained from a set of trained data. As explained in Section 2, the main interest in the residual spectrum is the subset of frequencies whose magnitude is higher than the trained threshold. The residual spectrum and the threshold comparison is made index by index where the binary decision 1 and 0 is made. Binary_mat ∈ R K×1 is a matrix where the binary decision results are stored. K is the maximum integer of index i.
The Binary_mat generation is provided for two case studies of an IM in healthy and faulty conditions. This binary example is adapted from [34]. A residual spectrum is obtained from an IM in healthy conditions, and it is compared index by index with a threshold. Based on the 1 and 0 decision, the Binary_mat is generated as follows, While for a residual spectrum that is obtained from an IM in faulty condition, the Binary_mat is generated as follows, Based on those examples, the Binary_mat energy in Equation (38) is higher than in Equation (37) as the IM in faulty condition has more residual spectrum magnitudes higher than the threshold. Therefore, a detection metric can be determined through evaluation of the Binary_mat by normalizing its energy,

Fault Identification
Consider a dataset of m measurements. Take a single-phase residual spectrum from each measurement, and there are m residual spectrums available. Calculate the magnitude mean and standard deviation of each residual frequency f r i as follows, Ideally, in the case of healthy IM, the Binary_mat in Equation (37) has only 0 s as its elements due to the residual spectrum must be less than the threshold. However, in the actual application, there are 1 s as its elements due to noise signatures/insignificant peaks occurred higher than the threshold. This condition occurs in faulty IM as well in Equation (38). Some 1 s that may not be associated with any fault signature. Thus, to determine whether 1 s belong to any specific fault signature or just belonged to random noise bands/insignificant peaks, an evaluation based on the probability distribution of fault and noise signatures is shown in Figure 7 must be done. The shaded area in Figure 7 represents the probability of fault signature/significant peaks, and the white areas represent the probability of noise signatures/insignificant peaks. The probability of fault or noise signatures can be calculated statistically through the Q-function as follows, where Probability_mat ∈ R K×1 contains the probability signatures of residual frequency f r i for m measurements. Thus, the value of each Probability_mat element is ranged as 0 ≤ Probability_mat i ≤ 1. Q-function is chosen as it is proven to be suitable for fault detection [34,35]. In this study, it is employed further for fault identification. Theoretically, the 1s associated with any specific fault signature will always occur on the specific frequency f r i for m times. Hence, the Probability_mat value at that specific frequency f r i is equal to 1. However, if there is a frequency offset, the Probability_mat value at that specific frequency f r i can be less than 1, but the summation probability of nearby frequencies must be close to 1. At the same time, the 1s associated with the noise bands/insignificant peaks will occur on the specific frequency f r i randomly with probability distribution shows in Figure 7.
Suppose H 0 is the null hypothesis states that a peak higher than the threshold is a fault signature/significant peak, and H 1 is the alternative hypothesis states that a peak higher than the threshold is a noise signature/insignificant peak. Because of the frequency offset possibility, the determination of 1s that associated with any specific fault can be done through integer index i sliding window summation as follows, where M is the integer index i sliding window that determines the number of previous and next integer indexes, and p TH is a user-defined threshold that may compensate the rate of fault identification. Through Equation (43), the frequency f r i in any residual spectrum that associated with any certain fault signature can be identified as f f k in Equation (5) by, where f f k is the k-th fault signature. As it is known that between one residual spectrum and another residual spectrum, the fault characteristic frequency may differ because of the frequency offset. Therefore, the f Finally, for each residual spectrum, the value of the magnitude of fault frequencies can be found according to each residual spectrum Binary_mat i wherein index i contains 1.
These magnitude values are stored as R k ( f f ). Then, the magnitude value of p identified fault frequencies in the residual spectrum can be used to determine the severity of the fault as follows, The procedure is presented as Algorithm 2. for i = 1, 2, 3, . . . , K do 21: if Binary_mat i,j = 1 and any(index( f f k ) = j) = 1 then 22: Binary_mat i,j = 1 23

Experimental Setup
This study is conducted on a wastewater centrifugal pump driven by an IM with a flexible coupling transmission, as shown in Figure 8a. The IM has the following rated parameters shown in Table 2. The experimental setup schematic representation is shown in Figure 8b. Three 25:5A class 1 current transformers (CT) with ±1% error tolerance are installed to measure the three-phase current signal. The class 1 CTs are used to obtain high SNR in the measured signal. The 5A output of the current transformers is acquired by using the NI-9246 DAQ card. The three-phase voltage signal is acquired by using the NI-9244 DAQ card. A 10 kHz sampling frequency samples both signals. The DAQ user interface is developed in the NI LabVIEW software environment. In addition, vibration measurements are performed as well in the turbulent flow experiments. An accelerometer transducer is installed in the radial position on the pump side. The measurement employs a 10 kHz NI DAQ card. For each aforementioned input opening, five vibration data are measured. Measuring the vibration data in an actual industrial facility is quite a challenge because the transducer cannot be installed in the optimum location, and the vibration noise is high.  Two types of experiments are done such as misalignment and turbulent flow. The misalignment is a common fault in the industry, and its fault signatures can be obtained using Equation (6). Moreover, the turbulent flow in a pump is a phenomenon that must be avoided, especially in a pumping system, as it may lead to the cavitation fault if it is allowed to happen for quite an extended period. In contrast to misalignment, there is no fault signature of turbulent flow presented in a mathematical expression. Thus, turbulent flow detection and identification rely on the presence of a frequency pattern in which its magnitude is above the residual threshold.
Three types of angular misalignment faults are executed by adding a washer for each IM back-side foot. Type 1: 0.2 mm thickness, type 2: 0.5 mm thickness, and type 3: 1.5 mm thickness. Meanwhile, the turbulent flow is induced in the pump by abruptly changing the pump input valve opening with 100%, 80%, 60%, and 50% openings. During the turbulent flow experiments, the pump output valve opening is set to be 50%. Closing the pump input valve is expected to reduce the volume of water entering the pump. Hence, reduce the pressure difference and induce both air and water in the pump and thus generate a turbulent flow in the pump. Table 3 shows the parameters used for the FDI algorithm presented in Section 4. Table 3. FDI algorithm parameters.

Parameters
Value Unit

Angular Misalignment
A set of 30 measurements is collected for each condition in the misalignment fault experiment, including healthy and three conditions of angular misalignment fault. The residual threshold is obtained from a set combination of 60 measurements on six different dates when the IM is running in a considerably healthy condition. It is common that fault detection is performed on the time-domain residual data [8,13,17]. However, time-domain residual data evaluation for fault detection will determine the success of the fault detection if the fault alters the output of the system, as an example, the current signal in our study. Hence, the difference can be seen from the time-domain residual data. Figure 9a shows the SEE time-domain data of the healthy and three angular misalignment conditions. At the same time, Figure 9b shows the f detect frequency-domain data. The difference between the healthy and misalignment conditions is hardly distinguishable in the SEE results because the misalignment fault signature magnitudes are too small to alter the output current signal. Thus, there is not much difference between healthy and misalignment residual SEE. However, it is distinguishable in the f detect data. The f detect shows distinguishable difference because it is calculated from the updated Binary_mat with respect to Equation (44). Hence, all of the 1 s in Binary_mat that is related to noise signatures are set to be 0 s-leaving the only 1 s that are related to fault signature. It can be seen from the healthy condition that the f detect is equal to 0 for all the samples because there is no detected fault signature leaving the Binary_mat with 0 s only. At the same time, for misalignment conditions f detect shows non-zero values. f detect surpasses SEE because its evaluation depends only on the presence of fault signatures in the frequency-domain. The residual spectrum can further be analyzed to identify the fault signatures. Figure Figure 10a. The less severe misalignment condition such as type 1, the less fault band f f k number identified. From the zoomed-in frequency band, it can be seen that the fault signature can be identified as a peak higher than the residual threshold and lies inside the fault band f f k . The peak is the fault signature described in Equation (6). Several peaks higher than the residual threshold are considered noise because they lay outside the fault band f f k . For type 2 and type 3 misalignment, the fault band f f k number is higher. Hence, the identified fault signature number is also higher. The difference between type 2 and type 3 misalignment lies in the energy of those fault signatures.
Sample of identified fault signatures from 5 residual spectrums for type 3 misalignment is presented in Table 4. Fault signature column corresponds to Equation (6). The frequency column corresponds to the theoretical characteristic frequency where the fault signature from Equation (6) may occur. f r i columns correspond to which residual frequency with integer i is identified as the fault characteristic frequency according to the related fault signature.  Based on Table 4, it can be seen that at fault signature f e − 1 × f r , there are three frequencies related to this signature because of the frequency offset phenomenon. They are identified as 30.  Figure 10d. Any fault characteristic frequency that occurs inside this band and is higher than the residual threshold will be considered the fault signature f e − 1 × f r . A similar explanation applies to the other fault signatures presented in Table 4.
Finding the fault frequency band f f k is proven to be effective to determine the fault characteristic frequency subjected to the frequency offset phenomenon and filters-out any insignificant peaks/noise signature for robust Binary_mat and f detect . The frequency offset phenomenon occurs due to the sampling frequency, data duration, and the number of data points acquired. This method can be used to achieve automated fault detection and identification without prior knowing the fault signatures.
A comparison between MFDI is compared to MCSA to evaluate the effectiveness of the proposed method. Figure 11 shows the comparison between MFDI represented by residual spectrum and MCSA represented by current spectrum. Only type 3 misalignment condition is chosen because the other types have a similar comparison. Figure 11b-d are used to explain the difference between current spectrum and residual spectrum. The first comparison is made based on Figure 11b. P1 points to f e + 1 × f r signature in the current spectrum, P2 points to the same signature in the residual spectrum. It can be seen that the energy of the signature is enhanced twice higher in the residual spectrum. It happens because of the removal of dominant harmonics energy in the spectrum by subtracting the current signal with the estimated current signal. Therefore, it makes the fault signature more evident in the residual spectrum.
Second comparison is presented in Figure 11c. P3 points to f e + 3 × f r signature in the current spectrum, P4 points to the same signature in the residual spectrum. The magnitude of signature pointed by P3 is relatively small to be considered as the fault signature. However, the signature can be considered the fault signature because of the residual threshold and the fault band f f k . It makes the residual spectrum more sensitive in fault signature identification.
Another advantage of residual spectrum is presented in Figure 11d. P5 points to identical signatures in the current spectrum, P6 points to the same signature in the residual spectrum. It is difficult to determine whether the first peak will also be considered a fault signature in the current spectrum. It is also difficult to evaluate whether it contributes to the fault severity. Nevertheless, using the proposed method, the first peak is not considered a fault signature since it simply lies outside the fault band and is lower than the threshold. Then, there is no contribution for that peak in fault severity. It proves that the proposed method can avoid false signature identification. Figure 12 shows the results of normalized fault severity evaluation. The fault severity is evaluated by using Equation (45), and technically, it is a summation of the energy of the identified fault frequencies. A mean value is calculated from 30 data in each condition. Since there is no fault signature identified in the healthy condition, no energy can be summed. Therefore, it indicates zero levels of fault severity. The normalized fault severity is highly dependent on the user-defined level. It depends on how the user will allow the IM to run in such a faulty condition. It is possible to allow the IM to run in more severe misalignment conditions than type 3. Thus, the maximum normalized fault severity level will be higher. However, the idea in Figure 12 is to present the severity evaluation based on the identified fault frequency energy. It provides a quantified severity evaluation.

Turbulent Flow
The flow turbulence experiment is executed by undertaking the abrupt changes in the pump input valve opening. Three types of abrupt change of pump input valve opening and one standard 100% opening are performed, and 30 measurements for each condition are done. The experiment is performed by letting the pump operated with 100% input valve opening, and then the valve is closed to 80% opening, to 60% opening, to 50% opening, and return to standard 100% opening as the beginning. According to vibration reports in [36], the flow turbulence indicates by the high-frequency bands.
Similar to the misalignment condition, the turbulent flow condition can be observed from the SEE level and the f detect . Figure 13a,b shows the turbulent flow experiment SEE level and f detect respectively. The first three different input openings cannot be distinguished clearly by using the SEE level. However, for 50% opening, it shows higher and distinguishable SEE. In contrast to the SEE level, the two last openings can be distinguished clearly. According to this result, it can be confirmed that the flow turbulence appeared when there is an abrupt change in the input valve openings at 60% and 50%. However, it can be observed more in detail using the residual spectrum of these different input valve openings. The example of residual spectrum for different input valve opening can be seen in Figure 14. In the turbulent flow analysis, the same residual threshold used in misalignment analysis is employed. P1, P2, and P3 point to frequency bands around 1×, 3×, and 5× harmonics respectively as shown in Figure 14a. Figure 14a shows the residual spectrum of 100% input valve opening. Hence, as is expected, no signature is detected. Once, the input valve is abruptly closed about 20% sudden rising of residual spectrum magnitude is noticeable in around 1× and 5× harmonics shown in Figure 14b. There are small fault bands f f k identified around 1× harmonic. However, the fault band is getting larger when the input valve is abruptly closed about 40% and 50% as shown in Figure 14c,d. Even in the 60% input valve opening, an additional fault band identified close to 5× harmonic where it is getting more prominent in the 50% opening. The sudden rising of residual spectrum magnitude for 60% and 50% openings can be noticed around 1×, 3×, and 5× harmonics. However, it is only around 1× and 5× harmonics where they rise higher than the threshold. Based on the residual spectrum analysis, the turbulence flow signatures occur as a frequency band rather than a single peak or a side-band signature as in the misalignment condition. Thus, it can be said that no flow turbulence occurred in 100% and 80% input valve openings. The flow turbulence occurs in 60% and 50% input valve openings. It is because the small input valve opening blocks and interrupts the fluid flow and reduces the pressure, which resulted in a turbulent flow. The turbulent flow can indicate input valve operation issues, whether manually or automatically operated. An issue in the input valve operation also leads the pump to run below its capacity.
The proposed MFDI and MCSA are also compared for turbulent flow detection through current and residual spectrum comparison. The comparison is presented only for 50% input valve opening. Figure 15 shows the comparison between the current and residual spectrum. The first comparison is presented to examine the pattern identified around 1× harmonic and it is presented in Figure 15b. F1 points to the mirror wrinkle patterns that lay inside the fault band f f k . At the same time, the patterns are not visible in 100% opening as pointed by H1 in Figure 15d. According to research presented in [31], a cavitation fault can be detected through the current spectrum by observing a mirror frequency band around 1× harmonic. It similar pattern as P1 points it, but with smaller magnitudes. It is essential to notice that if turbulent flow lasts for a more extended period, it eventually leads to cavitation fault. It can be the reason that the flow turbulence shows a similar signature pattern as it is reported in [31]. As in the residual spectrum, all the dominant harmonic energy is removed, the pattern can be seen more obvious. In the second comparison, a similar pattern can be observed close to the 5× harmonic in Figure 15c as pointed by F2. However, it is not as visible as the residual spectrum indicates.
In order to complete the comparison analysis, a radial vibration spectrum waterfall of the experimental pump is presented in Figure 16. According to [36], the flow turbulence signature can be observed in the vibration spectrum as a frequency band pattern below vibration 1× harmonic and as a higher frequency band pattern. A similar pattern also visible in the vibration spectrum of a pump suffers from cavitation fault as reported in [31]. The pattern of the turbulence flow can be seen in the vibration spectrum for 60% and 50% openings as pointed by V1.However, the turbulence pattern is not visible for 100% and 80% openings, as pointed by V2.The magnitude of the vibration 1× harmonic pointed by V3 and V4, respectively, is a distinguishable difference of turbulence and no turbulence condition. At the higher frequency band, the magnitude difference is pointed by V5 and V6 for turbulence and no turbulence condition, respectively. Due to the high vibration noise, the difference cannot be seen clearly.
The turbulent flow is not a cavitation fault. Nevertheless, if the condition is kept for a more extended period, it develops into a cavitation fault. Detecting the turbulent flow phenomenon helps avoid further issues and keeps the pump running in its optimal condition and capacity. Based on the presented comparison, the residual current spectrum provides better sensitivity in turbulent flow signatures identification.
A quantified severity evaluation is presented in Figure 17. It shows that the turbulence severity for 60% opening is much lower than the 50% openings. At the same time, there is zero level fault severity for 100% openings since there is no energy to be summed. This fault severity quantification can help to determine further corrective action for the faulty condition.

Conclusions
A novel MFDI framework to improve the IM-driven RM FDI is proposed in this study. The framework requires a discrete linear IM state-space model obtained using a data-driven SID algorithm assuming that the IM-driven system runs in quasi-steady-state condition. The MFDI utilizes the residual spectrum, residual threshold, and fault frequency bands. The combination of residual threshold and fault frequency band provides robust identification of fault signatures. In an undertaken comparison with MCSA, the proposed MFDI provides better sensitivity and identification results. The identified fault signature energy can also be used to determine fault severity level for further maintenance decisions. The experiment in an actual industrial wastewater facility also proves that the proposed MFDI framework provides a promising solution in developing an industrial-level FDI framework. An insight into an actual industrial application is that the difficulty in obtaining a perfect IM in healthy conditions can be overcome by adjusting the residual spectrum threshold according to the available set of IM data considered in healthy conditions.

Future Works
This presented study is performed in a centrifugal pumping system, assuming that the system runs in a quasi-steady-state condition. Further study will be performed in a system that runs in different operating frequencies and loading. In this scenario, a set of statespace models and residual thresholds must be identified and generated for each different operating condition. A cluster of state-space models and residual thresholds can be created based on the operating frequency and loading. A similar method to gain-scheduling will be developed for the appropriate selection of the state-space model and the residual threshold in the generated clusters.