1. Introduction
The early and accurate detection of abnormal events is crucial for today’s safetycritical systems with their continuously increasing complexity. Model based fault diagnosis has enjoyed considerable attention during the last three decades and provides a rich literature of techniques considering practical solutions for fault diagnosis [
1,
2,
3,
4,
5,
6,
7]. Highfidelity modeling requirements for model based fault diagnosis make it difficult and sometimes infeasible to implement for the complex system. Noteworthy examples from these complicated systems are networked control systems, internet of things [
8], and different chemical industry processes [
9]. A sufficient amount of data is available for the majority of such systems. This data contain information about the system, disturbances and noise, fault, and so on. To deal with such systems, various datadriven methods are being used with effective implementation.
These methods are broadly classified into two classes [
10]: (1) datadriven methodologies [
11,
12]; (2) modeldata based hybrid approaches [
13,
14,
15]. Among various pure data based methods, machine learning [
16], deep learning [
17,
18], and multivariate techniques including principal component analysis [
19,
20,
21] and partial least square methods [
22,
23,
24] are widely used with promising results. A modeldata based hybrid approach usually considers the development of FDI schemes that consists of an intersection between model based and datadriven techniques [
13]. Subspace identification method (SIM) emerged as one of the major techniques in this domain for the last two decades [
25,
26,
27,
28]. The key concept behind the Subspace identification method (SIM) approach is to identify the parity space for the construction of observer based fault detection and isolation (FDI) scheme [
26] or parity based FDI residual generators [
27] using input–output data. The effect of external factors such as noise and disturbances from recorded data are eliminated by introducing instrument variables. Finding orthogonal projection of an extended observability matrix enables constructing residuals based on a parity vector, which is made possible by using the singular value decomposition (SVD) on covariance matrices of recorded data [
29]. Since the Subspace identification method (SIM) is developed to estimate the dynamic model of the underlying system, it is worth mentioning that it also solves the problem of dealing with stationary as well as dynamical systems, compared to multivariate analysis (MVA) techniques which are operated on stationary processes. Although several MVA techniques have been developed to deal with dynamic issues, these are not suitable for stationary processes [
30].
Subspace identification has been applied to many practical systems including the Tennessee Eastman process [
26,
31], vehicle lateral dynamic system [
27], coupled liquid tank system [
32], wind turbine benchmark [
33,
34,
35], and three phase induction motor [
36,
37]. Since induction motors play a vital role in the process industry for converting electromechanical energy and are the most widely used devices [
38], our study would be directed toward implementing datadriven techniques on an induction motor. Due to the critical operation of induction motors in industries such as power generation, petroleum [
39], aerospace [
40], and the medical industry, it is required to monitor the reliability and running conditions to avoid any disastrous failures. Now, operators have started using rapid fault detection along with the usually planned maintenance, which could reduce the failure rate, increase the plant uptime, and reduce operational and maintenance costs.
The induction motor and its drive system could be subjected to different types of faults. Some of them could be [
41,
42]:
Stator fault that includes short and open circuiting of the stator winding;
Rotor faults dealing with endring cracking in the case of squirrel cage motor and short or open circuiting of the rotor winding;
Power supply failure due to power electronics damage of the driving system;
Mechanical fault including bearing damage, eccentricity misalignment, and a bent shaft.
Since induction motors are considered symmetrical machines, any fault could modify their operational behavior by changing symmetrical properties.
Machine learning and modelbased methodologies have also been applied to induction motor for fault detection and isolation purposes. Convolutional neural network (CNN) based fault diagnosis and classification was carried out by Maciej Skowron in [
43]. During the fault detection process, a neural network also acts as a classification system which also identifies degree of damage to the induction motor. Choosing a right structure of CNN for fault detection and isolation (FDI) system plays a vital role in methodology, as a minor change could lead to a huge increase in false alarm rate (FAR). Data based and model based techniques have their own advantages and disadvantages in different system states. Unknown faults can be detected and identified efficiently by model based methods while data based methodologies mostly analyse the input–output signal of the induction motor giving low weight to system dynamics [
44]. Hybrid approaches, which are a combination of system model based FDI schemes and pure data based methodologies, have shown great ability to detect fault and analyse fault with unbalanced conditions [
45].
Parity equation based methods provide less online computational complexity which leads to early detection of faults [
46]. In our FDI scheme, parity space is identified using a subspace identification method which is further used for residual generation. For the subspace method, it is challenging to find an optimal parity vector that makes generated residuals more sensitive towards fault and less sensitive towards external factors (e.g., noise and disturbances) simultaneously. To face such challenges, various performance indexes are proposed [
26,
27,
32,
37] for the computation of optimal parity vector. In [
27], a proposed index considers the sensitivity for actuator and sensor faults but does not consider the effect of unknown inputs. Similarly, Ref. [
32] proposed a performance index that considers the effect of actuator disturbance and actuator fault sensitivity in generated residual. Till then, enhancement of sensitivity towards sensor faults and sensor noise was discussed. However, Ref. [
37] proposed a methodology to optimize the parity vector for sensor fault and noise at the expense of actuator fault and disturbance.
In this paper, we have developed the subspacebased datadriven fault diagnosis scheme for dynamical systems based on the proposed performance index, which makes the residual sensitive towards actuator and sensor faults simultaneously while suppressing the effect of unknown inputs such as actuator disturbances and sensor noise. Furthermore, to determine the location of the faulty component, an isolation algorithm based on perfect unknown input decoupling is proposed with an improved performance index for an optimized parity vector.
The rest of the paper is organized as follows:
Section 2 includes dynamic consideration of induction motor used for online implementation.
Section 3 states the fundamentals of the subspace identification method and mathematical derivations. Proposed fault detection and isolation algorithms are also stated in
Section 3 and
Section 4, respectively.
Section 5 is related to the postprocessing of residuals, which include wavelet transformation and Gaussian Likelihood Ration test. Practical implementation of developed schemes is then implemented on an induction motor as described in
Section 6.
Section 7 concludes with a summary of the work.
2. Induction Motor Dynamics
The induction motor is operated by applying AC voltage to the stator, which produces the AC current in the rotor circuit. Since there is no electrical connection between the rotor and stator of the induction motor, it makes the motor almost maintenancefree and more efficient due to no power loss in the commutators. Squirrel cage induction motors are the most commonly used motor varying from a few to hundreds of horsepower. The rotor rotates when it is subjected to varying magnetic fields, and the speed of the rotor depends upon the frequency
$\left({f}_{e}\right)$ of applied voltage and stator poles
$\left(p\right)$ of the motor as described in Equation (
1):
A mathematical model of the motor could be constructed by considering the equivalent model of a signal phase induction motor. The nominal parameters of induction motor under consideration are given in
Table 1. A 5th order statespace model of the induction motor is constructed based on the statorfixed frame reference:
Here, the state vector is representing the dq axis fluxes and current along with the speed of rotor, such that $\eta ={\left[{\eta}_{1}\phantom{\rule{0.222222em}{0ex}}{\eta}_{2}\phantom{\rule{0.222222em}{0ex}}{\eta}_{3}\phantom{\rule{0.222222em}{0ex}}{\eta}_{4}\phantom{\rule{0.222222em}{0ex}}{\eta}_{5}\right]}^{T}={\left[{\varphi}_{d}\phantom{\rule{0.222222em}{0ex}}{\varphi}_{q}\phantom{\rule{0.222222em}{0ex}}{i}_{d}\phantom{\rule{0.222222em}{0ex}}{i}_{q}\phantom{\rule{0.222222em}{0ex}}\omega \right]}^{T}$. Since motor is being activated by the three phase voltage, the acting actuator input would be these supply voltages.
Constants used in the model (
2) are defined as
further,
Thus, the dynamic model for an induction motor (SE26723G) as shown in
Figure 1 could be found using values given in
Table 1. Besides nominal parameters, the nominal operating conditions of motor are stated in
Table 2.
3. Fault Diagnosis Scheme
Parity vectorbased residual generation is one of the famous fault diagnosis schemes used in both modelbased and datadriven FDI systems [
47]. The parity vector scheme allows us to use techniques developed in modelbased FDI literature for datadriven systems. In order to obtain the advantage of the availability of input–output data samples of the process and avoid complexity issues to obtain a system model, the subspace identification mechanism is focused on this work as expressed in
Figure 2.
Consider a discrete LTI model that is described as follows:
where the input, state and output vectors are
$u\left(k\right)\in {\Re}^{l}$,
$x\left(k\right)\in {\Re}^{n}$ and
$y\left(k\right)\in {\Re}^{m}$, respectively. Furthermore,
${f}_{a}\left(k\right)\in {\Re}^{l}$,
${f}_{s}\left(k\right)\in {\Re}^{m}$,
$w\left(k\right)\in {\Re}^{n}$ and
$v\left(k\right)\in {\Re}^{m}$ represent the actuator faults, sensor faults, actuator noise and sensor noise, respectively. The matrices A, B, C and D are constant matrices of appropriate dimensions.
Consider the system described in Equation (
3) that is written in recursive form with
s number of samples such that
$s>n$ as follows:
In a similar fashion,
${f}_{s}\left(k\right)$,
${f}_{a}\left(k\right)$,
${w}_{s}\left(k\right)$ and
${v}_{s}\left(k\right)$ could be constructed. The output of the system could be written as
where
Let the residual be defined as
${\Psi}^{\perp}$ is defined as the parity space of the system, such that
${\Psi}^{\perp}\Psi =0$. Substituting Equation (
5) into Equation (
9),
Equation (
10) states that, if there is no fault in the system, the generated residual would only be having the effect of disturbance and noise acting on the actuators and sensors. According to Equations (
8) and (
9), due to the unavailability of the statespace model, our main objective for datadriven diagnostic system is to identify the left coprime factorization
$({\Psi}^{\perp},{\Psi}^{\perp}{M}_{us})$ of the underlying system from healthy input–output data.
N number of input–output data samples are collected during the healthy operation of the process. Assume that the order of the system is
n, and indices
p and
f refer to the past and future, whereas
$s={s}_{p}={s}_{f}$. From healthy input–output data, past and future input–output block Hankel matrices are constructed by subdividing data samples into past and future data samples, as
Here,
${u}_{{s}_{p}}$,
${u}_{{s}_{f}}$,
${y}_{{s}_{p}}$ and
${y}_{{s}_{d}}$ are represented as in Equation (
4). For a fault free case, a future values data matrix could be written as
for
$N\gg s$, where
$s\approx {s}_{p}\approx {s}_{f}>n$, and the covariance matrix of the collected data could be constructed as follows:
and the matrix
${Z}_{p}$ is also known as an instrumentation variable used to remove the effect of noise. It is assumed that the noise in the system is uncorrelated with the collected I/O data. Furthermore, to normalize data, Equation (
13) is divided by
N. A singular value decomposition of covariance matrix would lead to the identification of left coprime factors or datadriven residual as follows:
Here,
and the parameter
${\Psi}^{\perp}$ and
${\Psi}^{\perp}{M}_{us}$ would be identified as [
27]
By substituting the identified parameters in Equation (
9), the residual can be constructed as given below:
The above equation indicates that, despite the faultfree case, the residual will be affected by noise and disturbances. It is of vital importance to obtain optimal parity space that also mitigates the effect of noise and disturbances in the residual. The procedure for identifying a subspacebased residual using I/O data is summarized in Algorithm 1.
Algorithm 1 SIM based fault detection algorithm 
 1:
Collect the I/O data from plant during healthy operation  2:
Estimate the order n of the system and set ${s}_{f}={s}_{p}\approx s>n$  3:
Construct block hankel matrices ${U}_{p}$, ${U}_{f}$, ${Y}_{p}$ and ${Y}_{f}$  4:
Construct the covariance matrix using Equation ( 13)  5:
Obtain the ${\Psi}^{\perp}$ and ${\Psi}^{\perp}{M}_{us}$ using Equation ( 15)  6:
Construct the residual as defined in Equation ( 9)

Proposed Robustness Method
The robustness problem could be modeled as selecting the parity vector from parity space to enhance the sensitivity towards faults and be less sensitive against noise and disturbances. The residual, defined in Equation (
9), contains the matrices
${M}_{us}$ and
${N}_{ds}$ also known as fault and disturbance coupling matrix, respectively. Choosing parity vector
${\Psi}^{\perp}$, which solves the objective function defined in Equation (
16), maximizes the
${M}_{us}$ and minimizes the
${N}_{ds}$, which could lead to the insensitivity towards sensor fault and noise:
To increase sensitivity towards a sensor fault, index
${J}_{2}$ was proposed in [
37] defined in Equation (
17)
Index (
17) solves the problem for sensitivity towards sensor faults but performs worst in case of an actuator fault because the actuator fault coupling matrix is maximized. To solve this issue, a performance index is proposed as follows:
Now, the combined effect of fault and disturbance could be defined in the following way:
where
S is design parameter, in this case, parity vector. Now, the performance index could be defined as
To solve index
${J}_{3}$, a generalized eigenvalue problem could be used as follows:
${l}_{s}$ is the eigenvector that maximizes the
${M}_{usi}$ and minimizes the
${N}_{dsi}$, where
${\lambda}_{s}$ is the corresponding eigenvalue. The procedure for robust residual construction could be summarized as follows in Algorithm 2.
Algorithm 2 SIM based robust fault detection algorithm 
 1:
Using Algorithm 1 obtain the parameter ${\Psi}_{s}^{\perp}$ and ${\Psi}_{s}^{\perp}{M}_{u}s$  2:
Solve ( 20) to obtain eigenvector ${l}_{s}$ 3:
Obtain the robust residual generator as follows
 4:
Construct the residual as

6. Online Implementation and Results
The induction motor (SE26723G) as shown in
Figure 1 is started in
$Y$configuration and loaded with its nominal torque, and the motor runs at its nominal speed.
Now, 1200 samples are collected with sampling time of
${T}_{s}=0.1$ s. With
${s}_{f}=8$,
$n=4$, the identified parity space
${\Psi}^{\perp}\in {\Re}^{12\times 16}$ and
${\Psi}^{\perp}{M}_{us}\in {\Re}^{12\times 16}$. Now, based on Equation (
21), three different residuals are constructed using index (
16), (
17) and (
19). The generated residuals are shown in
Figure 3.
Various types of faults could occur during the operational state of the induction motor, including insulation breaking of the stator field winding, excessive current increase due to overloading, measurement devices such as ampere/volt meter could become defective, and supply voltage (modeled as an actuator) could become defective in the case of an unreliable power source. The profile of fault could be step, ramp, sinusoidal, etc., based on the conditions and nature of faulty components in the motor.
The computed variance for generated residuals is var$\left({{\rm Y}}_{{J}_{1}}\right)$ = 0.0133, var$\left({{\rm Y}}_{{J}_{2}}\right)$ = 0.0103 and var$\left({{\rm Y}}_{{J}_{3}}\right)$ = 0.0075, which shows the reduction of noise power in residual computed through the proposed index in the absence of a fault.
Consider the representation of faults as
A pulse fault is introduced in the actuator 1 (17–25 s) and sensor 2 (83–92 s), and the resultant residual using
${J}_{1}$,
${J}_{2}$ and
${J}_{3}$ is shown in
Figure 4.
Performance of the residual generator is also estimated by measuring the standard deviation and variance of generated residuals. Statistical data for residuals obtained using performance index
${J}_{1}$,
${J}_{2}$ and
${J}_{3}$ are shown in
Table 4.
From
Table 4 it is clear the
${J}_{2}$ performs better than
${J}_{1}$ due to more variance and detectability. Similarly, statistical data for
${J}_{3}$ are shown in
Table 4 and also show that the standard deviation and variance of generated residual are greater than
${J}_{1}$ and
${J}_{2}$, which is also the reason for better detectability of faults. The false alarm rate (FAR) and false detection rate (FDR) for performance index
${J}_{1}$,
${J}_{2}$ and
${J}_{3}$ are shown in
Table 5.
Fault Isolation
In three phase induction motor (SG2672), there are two sensors and two actuator faults. The current in the $dq$ axis is modeled as sensors 1 and 2, respectively, while the applied AC power supply for the $dq$ axis is modeled as actuators 1 and 2, respectively. Applying Algorithms 1 and 2 would indicate the presence of the fault in the motor, but it would not specify the location of the faulty component. For that purpose, we would use a modified isolation procedure in Algorithms 3 and 4 based on perfect unknown input decoupling.
Parity spaces, ${\Psi}^{\perp}$ and ${\Psi}^{\perp}{M}_{us}$, obtained using Algorithms 1 and 2, are being used for actuator and sensor fault isolation using proposed robust parity vectors ${l}_{s,act,i}$ and ${l}_{s,sen,i}$, respectively.
Since m = 2 and ${k}_{f}=4$, a total of four residuals ${{\rm Y}}_{1}$, ${{\rm Y}}_{2}$, ${{\rm Y}}_{3}$, ${{\rm Y}}_{4}$ are generated based on ${}^{4}{C}_{2}$.
Furthermore, representing the residual as a function of faults as in Equation (
28),
A fault is introduced in the dq axis of stator voltages (actuator 1, 2) and daxis stator current for different time intervals, The faults are then detected online using Algorithm 2, and residual is evaluated based on the GLR threshold as shown in
Figure 5. Furthermore, using Algorithm 3 and 4, residuals are constructed for indication of faulty components as shown in
Figure 6. Now, at any instant, a decision about faults in specific components could be made using
Table 3. It is evident from the figure that faults have occurred in stator voltages
$({v}_{q},{v}_{d})$ and current
$\left({i}_{q}\right)$ during specified time intervals.
Generally, the nature of the faults is of low frequency so applying wavelet transform and keeping only approximate coefficients (containing lower frequencies of the residual) could reduce the false alarm rate as shown in
Figure 7.