Next Article in Journal
Exploring Artificial Neural Networks Efficiency in Tiny Wearable Devices for Human Activity Recognition
Next Article in Special Issue
Resilient Consensus Control Design for DC Microgrids against False Data Injection Attacks Using a Distributed Bank of Sliding Mode Observers
Previous Article in Journal
Impact of Li-Ion Battery on System’s Overall Impedance and Received Signal Strength for Power Line Communication (PLC)
Previous Article in Special Issue
Non-Destructive Techniques for the Condition and Structural Health Monitoring of Wind Turbines: A Literature Review of the Last 20 Years
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Data-Driven Fault Diagnosis Techniques: Non-Linear Directional Residual vs. Machine-Learning-Based Methods

by
Nicholas Cartocci
1,
Marcello R. Napolitano
2,
Francesco Crocetti
1,
Gabriele Costante
1,
Paolo Valigi
1 and
Mario L. Fravolini
1,*
1
Department of Engineering, University of Perugia, Via G. Duranti 67, 06125 Perugia, Italy
2
Department of Mechanical and Aerospace Engineering, West Virginia University, Morgantown, WV 26506, USA
*
Author to whom correspondence should be addressed.
Sensors 2022, 22(7), 2635; https://doi.org/10.3390/s22072635
Submission received: 21 February 2022 / Revised: 26 March 2022 / Accepted: 28 March 2022 / Published: 29 March 2022
(This article belongs to the Special Issue Artificial Intelligence for Fault Diagnostics and Prognostics)

Abstract

:
Linear dependence of variables is a commonly used assumption in most diagnostic systems for which many robust methodologies have been developed over the years. In case the system nonlinearities are relevant, fault diagnosis methods, relying on the assumption of linearity, might potentially provide unsatisfactory results in terms of false alarms and missed detections. In recent years, many authors have proposed machine learning (ML) techniques to improve fault diagnosis performance to mitigate this problem. Although very powerful, these techniques require faulty data samples that are representative of any fault scenario. Additionally, ML techniques suffer from issues related to overfitting and unpredictable performance in regions which are not fully explored in the training phase. This paper proposes a non-linear additive model to characterize the non-linear redundancy relationships among the system signals. Using the multivariate adaptive regression splines (MARS) algorithm, these relationships are identified directly from the data. Next, the non-linear redundancy relationships are linearized to derive a local time-dependent fault signature matrix. The faulty sensor can then be isolated by measuring the angular distance between the column vectors of the fault signature matrix and the primary residual vector. A quantitative analysis of fault isolation and fault estimation performance is performed by exploiting real data from multiple flights of a semi-autonomous aircraft, thus allowing a detailed quantitative comparison with state-of-the-art machine-learning-based fault diagnosis algorithms.

1. Introduction

Fault diagnosis (FDi) systems are essential components of many engineering systems. These systems play a very important role for accident prevention, service continuity, and cost minimization, ultimately leading to increased human safety in transportation systems. Due to the increasing availability of multi-source sensorial data along with the significant computational power and storage capacity of modern computers, today, there is an increasing research interest in developing data-driven (DD) algorithms to tackle complex monitoring and control problems [1]. DD approaches are widely used in applications where a detailed physical knowledge of the system is unavailable or not readily available or in cases where system input-output relations are too complex or too uncertain. Typically, DD approaches applied to FD problems derive fault-sensitive signals (also known as diagnostic signals) directly from experimental models identified from sample datasets acquired from the monitored system during normal and faulty operations. The pioneers in this field are Isermann [2,3,4], Basseville [5,6,7], and Gertler [8,9,10].
Today, multivariate statistical process monitoring (SPM) methods, such as principal component analysis (PCA) [11] (and its variants) and parity space approaches [12,13,14,15], are widespread techniques used for system monitoring and fault diagnosis purposes. The widespread use of PCA-based monitoring techniques is due to its simplicity and its capability to efficiently manage large quantities of multivariate data. In this context, widespread approaches used for fault isolation (FI) are the so-called contribution plots methods, such as the reconstruction-based contributions method [11,13]. Although powerful and effective, this method might produce incorrect fault isolations due to the so-called smearing effect, i.e., the influence of a faulty sensor measurement on non-faulty sensors contributions. In a system characterized by a limited number of monitored variables, directional residuals methods have shown to be a valid alternative to SPM methods. Directional residual methods work by relating possible faulty sensors with the characteristic direction of the faults, isolating the faulty sensor as the one having the smallest distance between the residuals signal and the monitored sensor fault directions [12,13,16]. An important variant of the directional residual method is the structured residual approach [9].
A fundamental assumption underlying most SPM techniques is the linear dependence between modeling variables. While this assumption may be reasonable in some applications, in the case of strongly non-linear systems or significant variations in the operating range of the signals, these techniques could lead to inaccurate results. The direct consequence on fault diagnosis is that the linearity in covariate FD models may produce many false positives and false negatives in practical applications when applied to non-linear systems.
A typical approach to allow non-linear dependencies between the covariates is to augment their number by introducing several new terms derived by non-linear transformations of the original signals while retaining the linear dependence between the extended set of covariates. Although this approach is straightforward, it suffers from the problem that the set of extended non-linear functions cannot be defined ‘a priori’. Instead, a complex and time-consuming tuning phase is needed to select a suitable set of non-linear transformations. These problems have motivated the scientific community to go beyond the linear dependence assumptions of covariates, promoting the development of non-parametric models based on generic non-linear smooth functions, such as splines, neural networks, and kernel models, and implement specific data-driven tuning algorithms to reduce the mean and variance of the mapping error to identify best-fitting models [17].
Another direction to deal with the problem of fault diagnosis in non-linear systems is related to the use of machine learning and deep learning techniques. The issue of Fault Isolation, in fact, can be easily set as a direct application of a classification or clustering problem. At the same time, ML regression techniques can also be employed for estimating the shape and amplitude of the fault (Fault Estimation (FE)) [18,19,20,21].
An interesting comparison of FI and FE performance of three popular classification algorithms, namely the support vector machine (SVM), K-nearest neighbor (KNN), and decision tree (DT), can be found in [17]. The classification algorithms are trained using data from optimized and non-optimized sensor subsets and then validated with new data characterized by varying degrees of fault severity. In [22], Erfani et al. present a hybrid model in which an unsupervised deep belief network (DBN) is trained to extract latent features. Then, a one-class support vector machine (SVM) based on the DBN features is trained to learn decision surfaces. In [23], Revathi and Kumar proposed a deep-learning-based anomaly detection and classification system in video sequences, where the final module classifies the detected events as usual or suspect. In [24], Pashazadeh et al. propose a data-driven fault detection and isolation (FDI) scheme based on the fusion of different classifiers for a wind turbine challenge system. Multi-layer perceptron (MLP), radial basis function (RBF), decision tree (DT), and K-nearest neighbor (KNN) classifiers are implemented in parallel, and the faulty sensor is identified using a majority voting method. Next, discrete-time up–down counters (UDCs) are used for each fault to reduce false alarms (FAs) and missed detections (MDs). In [25], an efficient strategy for fault detection and isolation (FDI) for an industrial gas turbine based on ensemble learning methods is introduced; specifically, a fault isolation scheme based on ensemble bagged trees is developed to isolate faults in a steady-state runtime.
In this study, we considered a natural non-linear extension of linear regression models of the form 1 p w i x i , i.e., the class of the so-called generalized additive models (GAM) [26] 1 p w i g i ( x i ) , where g i ( x i ) are generic smooth functions to be identified from data. Due to the simple additive structure, GAMs are sufficiently versatile for capturing linear or non-linear relationships between response functions and covariates. The additive form is of particular interest for FDi problems because it allows the easy calculation of the fault sensitivity of the individual monitored variables. Although GAMs have been used extensively in many application fields, their employment in a data-driven FDi system has not been fully explored. In fact, to date, only a few studies have been presented, such as [16,27]. Motivated by the mentioned issues, in this paper, we propose GAMs to identify non-linear parity relationships in the monitored variables using cubic spline basis functions to characterize the non-linear functions g i ( x i ) by exploiting the MARS modeling and estimation algorithm proposed in [28].
Next, to exploit the consolidated tools available for the fault diagnosis of linear parity space models, a local linearization of the identified GAM parity relations is performed to achieve a time-dependent fault sensitivity matrix.
Indeed, unlike standard linear parity methods, the so-called fault signature matrix is not constant but, rather, it depends on the operating point, implying that the resulting fault directions are not constant but time-varying. A state-dependent fault sensitivity model can better capture the effects of faults in a non-linear system compared to a standard linear and fixed fault sensitivity matrix. The proposed GAM plus linearization approach can immediately fit the directional residual FI method developed in the linear contest and applied in [14,15]. In addition to this first key innovative aspect, this study aims to show the effectiveness of the proposed technique compared to machine learning techniques applied to the problems of fault isolation and fault estimation. In particular, the second main contribution of this research is related to the comparison of the proposed directional residual-based technique with ML fault diagnosis methods. This is performed by investigating the advantages and disadvantages of each approach, highlighting the benefits in terms of performance, memory occupancy and robustness to unexpected fault amplitudes.
The proposed novel method is applied to design a complete fault isolation and estimation (FIE) system based on real sensor data taken from a semi-autonomous aircraft where single additive faults are artificially injected on eight primary sensors.

2. Non-Linear Additive Models for Fault Diagnosis

The set of the monitored (potentially faulty) sensors measurements is concatenated in the vector x ( k ) n x , while the set of control signals and other no monitored sensors (assumed not faulty) are included in the vector u ( k ) n u . The integer k is the discrete-time index at sample time t = k Δ t (where Δ t is the sampling interval). Occasionally in the article, the dependence on k is omitted to simplify the notation. The proposed FD technique is based on analytical redundancy (AR) concepts [12,13,16]. It is assumed that a sensor measurement x i ( k ) is approximated by a non-linear additive model consisting of the linear combination of non-linear functions ( g i , j and h i , j ) defined as follows
x ^ i ( k ) = j = 1 j i n x w x i , j g i , j ( x j ) + j = 1 n u + 1 w u i , j h i , j ( u j ) i = 1 n x
where w x i , j and w u i , j are constant coefficients (to be estimated from data) and g i , j and h i , j are non-linear functions of the variables x j ( k ) and u j ( k ) , respectively, typically representing a rectified linear unit (ReLU) or Gaussians or polynomial splines [26]. The functions h i , n u + 1 are assumed to be constant and equal to one in order to take into account possible constant offsets in the models. The actual signal x i ( k ) is
x i ( k ) = x ^ i ( k ) + Δ i ( k ) i = 1 n x
where Δ i ( k ) characterizes the modeling error and sensor noise associated with the i -th sensor. The primary residual associated to each sensor is defined as
r i ( k ) = x ^ i ( k ) x i ( k ) i = 1 n x
At fault-free conditions, it results in the following:
r i ( k ) = Δ i ( k ) i = 1 n x
i.e., the residual is equal to the modeling uncertainty. Typically, | Δ i ( k ) | is a small amplitude signal, and in the ideal perfect modeling noise-free case, this is equal to 0. In the present study, we considered the occurrence of single additive sensor faults f j ( k ) R on the generic j -th sensor. In the presence of a sensor fault, the fault-free measurement x j ( k ) should be substituted by the faulty signal x j ( k ) + f j ( k ) x j ( k ) , that is
x j ( k ) + f j ( k ) x j ( k )
where f j ( k ) is a generic fault modelling function that is zero in fault-free conditions and different from zero in the presence of the sensor fault. When a fault is present on the j -th sensor, the impact on the residuals can be evaluated by substituting (5) in (1) and (3). It is immediate to verify that
r i ( k ) = w x i , j [ g i , j ( x j ( k ) + f j ( k ) ) g i , j ( x j ( k ) ) ] + Δ i ( k ) i j r i ( k ) = f i ( k ) + Δ i ( k ) i = j
In the above Equation (6), the term g i , j ( x j ( k ) ) is not directly computable because the sensor reading is equal to x j ( k ) + f j ( k ) and not to x j ( k ) in the presence of a fault. For this reason, the Taylor approximation of g i , j ( x j + f j ) around x j + f j is computed, that is
g i , j ( x j + f j + δ f j ) = g i , j ( x j + f j ) + g i , j ( x j + f j ) f j x j + f j δ f j + Δ g i j
where δ f j is a fault increment and Δ g i j contains higher-order terms of the Taylor expansion. In this study, we exploited model (7) to compute an approximation of g i , j ( x j ) . This can be easily achieved by taking δ f j = f j in (7) (in fact, results g i , j ( x j + f j + δ f j ) = g i , j ( x j + f j f j ) = g i , j ( x j ) ) resulting in:
g i , j ( x j ) = g i , j ( x j + f j ) + g i , j ( x j + f j ) f j x j + f j ( f j ) + Δ g i j
Substituting Expression (8) for g i , j ( x j ) in the residuals in (6) results in the following:
r i ( k ) = w x i , j [ g i , j ( x j + f j ) g i , j ( x j + f j ) g i , j ( x j + f j ) f j x j + f j ( f j ) Δ g i j ] + Δ i i j
that is
r i ( k ) = w x i , j g i , j ( x j + f j ) f j x j + f j f j + [ Δ i w x i , j Δ g i j ] i j
Define:
w ¯ i , j ( x j + f j ) = w x i , j g i , j ( x j + f j ) f j x j + f j i j
then (10) becomes
r i ( k ) = w ¯ i , j ( x j + f j ) f j + [ Δ i w i , j Δ g i j ] i j
the above expressions can be arranged in matrix form resulting in:
r 1 ( k ) r 2 ( k ) r 3 ( k ) r n x ( k ) = 1 w ¯ 1 , 2 ( x 2 + f 2 ) w ¯ 1 , 3 ( x 3 + f 3 ) w ¯ 1 , n x ( x n x + f n x ) w ¯ 2 , 1 ( x 1 + f 1 ) 1 w ¯ 2 , 3 ( x 3 + f 3 ) w ¯ 2 , n x ( x n x + f n x ) w ¯ 3 , 1 ( x 1 + f 1 ) w ¯ 3 , 2 ( x 2 + f 2 ) 1 w ¯ 3 , n x ( x n x + f n x ) w ¯ n x , 1 ( x 1 + f 1 ) w ¯ n x , 2 ( x 2 + f 2 ) w ¯ n x , 3 ( x 3 + f 3 ) 1 f 1 f 2 f 3 f n x + Δ ¯
The matrix in (13) ( W ¯ ( k ) n x × n x ) is known as the fault sensitivity matrix and Δ ¯ n x × n x contains all the uncertain terms in (13). It is observed that the matrix W ¯ ( k ) is time-dependent; in other words, it depends on the current measurements at time k.
The occurrence of a single fault at a time on a generic j -th sensor is assumed here; therefore, in (13), only the component f j is different from zero (that is f 1 f 2 f 3 f n x T = 0 f j 0 T ). This implies that (13) simplifies to
r 1 ( k ) r 2 ( k ) r n x ( k ) = w ¯ 1 , j ( x j + f j ) 1 w ¯ n x , j ( x j + f j ) f j + Δ ¯ j
In vector form
r ( k ) = w ¯ j ( k ) f j ( k ) + Δ ¯ j j = 1 n x
where r ( k ) n x is the primary residual vector and w ¯ j ( k ) n x (the j -th column vectors of the matrix W ¯ ( k ) ) defines the so-called fault direction. Assuming a sufficiently large fault f j compared to Δ ¯ j , the residual vector direction tends to be alienated to the known direction of the vector w ¯ j ( k ) . This directional information will be later exploited for sensor FI purposes. Unlike our previous papers in [12,13]—where the faults signature matrix is constant by construction—the fault signature matrix is time-varying in the present study.

2.1. Linear Model Case

In case the functions g i , j and h i , j are approximated by simple linear in the variables models, then Equation (1) simplifies to:
x ^ i ( k ) = j = 1 j i n x w x i , j x j + j = 1 n u + 1 w u i , j u j i = 1 n x
where w x i , j and w u i , j are constant weights, implying that the time-dependent matrix W ¯ ( k ) becomes a constant matrix W [12,13] and the associated fault directions w ¯ j j = 1 n x are fixed and constant vectors.

3. Fault Diagnosis (FDi)

3.1. Fault Isolation (FI)

Starting from the considerations in Section 2 and considering Equation (15), the faulty sensor is inferred (isolated) by exploiting the fault directional properties of matrix W ¯ ( k ) . Specifically, the faulty sensor is isolated by evaluating the angular distances between the direction of the residual vector r ( k ) and the n x fault directions, i.e., the columns of the matrix W ¯ ( k ) . The sensor fault direction with the lowest angular distance from the residual direction flags the faulty sensor, that is
I F ( k ) a r g m i n j 1 , , n x ( r ( k ) , W ¯ ( k ) )
where I F ( k ) is the fault index function that takes values from 1 to n x and indicates, at time k , the index of the isolated faulty sensor. This FI technique is previously applied in [12,13,14,15]. The same FI method is also unchanged when the simple linear models in (16) are used, with the only difference that the fault directions are now constant in time because, in this case, the matrix W is constant.

3.2. Fault Estimation (FE)

An advantage of operating with primary residuals, as defined in (3), is that the fault amplitude can be estimated directly. In fact, starting from Equation (15), assuming a negligible effect of Δ ( k ) , the fault amplitude can be directly calculated as
f j ( k ) = [ w ¯ j ( k ) ] 1 r ( k )
The same FE method also applies in the case of the linear models (16).
NOTE-1: Before the FI and FE phases there is usually a fault detection (FD) phase dedicated to the detection of the occurrence of a generic anomaly condition. Since the primary purpose of this study is focused on the evaluation of FI and FE algorithms, we assumed here an “ideal” fault FD, i.e., the additive fault is detected as soon as it is injected into the sensors. Clearly, in practice, an FD delay and missed detection might be possible. The issue of FD is addressed in many studies; a detailed overview of FD techniques can be found in [29,30]. In addition, the authors have also discussed the issue of data-driven FD in previous works, e.g., in [12,14,15].

4. Multivariate Adaptive Regression Splines (MARS)

In this study, the data-driven identification of the non-linear functions ( g i , j and h i , j ) defined in (1) is performed using Friedman’s multivariate adaptive regression Splines algorithm [28] which is a well-known procedure used to identify non-parametric additive models from data. The MARS algorithm can be easily set to fit the structure of the non-linear additive models in (1). In practice, the identification of the primary residuals is performed by exploiting the adaptive regression splines toolbox [31] (ARESLab). MARS is a non-parametric regression technique and can be viewed as a non-linear extension of linear regression models that can be used to model non-linear dependencies in high-dimensional data. Technically, MARS models consist of the linear combination of spline basis functions; in ARESLab, the number of basis functions and the parameters characterizing their shape is inferred directly from data through a forward–backward iterative approach [28,31]. Starting from (1), the considered MARS functions have the following form:
g i , j ( x j ) = m x = 1 M x , i , j α i , j , m x B i , j , m x ( x j ) h i , j ( u j ) = m u = 1 M u , i , j β i , j , m u B i , j , m u ( u j )
where M x , i , j is the number of basis functions that are selected by the MARS forward–backward iterative approach to identify the g i , j function. The α i , j , m x are constant coefficients and B i , j , m x is the m x -th basis function that depends only on the x j variable. Similar definitions can be attributed to M u , i , j , β i , j , m u , and B i , j , m u .
The ARESLab allows the selection of different classes of basis functions, such as piecewise ReLU and piecewise cubic splines. We used piecewise continuous cubic splines with continuous first derivatives to estimate the non-linear functions. An in-depth discussion and comparison between piecewise cubic models and piecewise linear models can be found in [28].
The considered piecewise cubic spline basis functions consist of one or two “complementary” basis functions B ( x | s , κ - , κ , κ + ) . These are defined as follows:
B ( x | s = 1 , κ - , κ , κ + ) = 0 x < κ - p + ( x κ - ) 2 + r + ( x κ - ) 3 κ - < x < κ + x κ x κ +
B ( x | s = 1 , κ - , κ , κ + ) = ( x κ ) x < κ - p - ( x κ + ) 2 + r - ( x κ + ) 3 κ - < x < κ + 0 x κ +
for x is a scalar, κ - < κ < κ + and
p + = ( 2 κ + + κ - 3 κ ) / ( κ + κ - ) 2 r + = ( 2 κ + κ + κ - ) / ( κ + κ - ) 3 p - = ( 3 κ 2 κ - κ + ) / ( κ - κ + ) 2 r = ( κ - + κ + 2 κ ) / ( κ - κ + ) 3
The shape parameters κ - , κ + , and κ represent the lower side knot, the upper side knot, and the central knot, respectively. The first two knots define the change point between the functions, while the last influences the cubic and the linear functions. The ARESLab procedure automatically estimates the slope parameter and the number of Basis Functions to build the model. In particular, the design does not necessarily use both “complementary” basis functions; only the positive one or the negative one could be used in the estimation model.
In this study, the MARS algorithm for each sensor model in (1) is applied separately using the same data segment. Once the MARS spline basis functions are identified for all the monitored sensors, the local fault sensitivity matrix W ¯ ( k ) is analytically computed following the linearization procedure described in Section 2.
NOTE-2: Since there are no iterations (multiplications) between the sensor measurements, the fault diagnosis method based on the MARS algorithm could be implemented exclusively through the matrix product between suitably defined matrices and the values measured by the sensors. In particular, combining Equations (1), (3), and (19)–(22), it is possible to calculate the residual r ( k ) = Γ ( ) χ 1 3 ( k ) + H ( ) μ ( k ) where Γ ( ) n x × 3 n x + 1 , H ( ) n x × 3 n u + 1 , χ 1 3 ( k ) = x + F x + F 2 x + F 3 1 3 n x + 1 , and μ ( k ) = u u 2 u 3 1 3 n u + 1 . Similarly, the matrix W ¯ ( k ) can also be calculated through the matrix product between some matrices appropriately defined and the values measured by the sensors. In particular, the column vector W ¯ j ( k ) = Ω ( ) χ 1 2 ( k ) where Ω ( ) n x × 2 n x + 1 and χ 1 2 ( k ) = x + F x + F 2 1 2 n x + 1 . The matrices Γ ( ) , H ( ) and Ω ( ) depend on the shape parameters κ , κ + of each basis function of the model and the values of the sensor measurements. Therefore, the implementation complexity of the proposed approach is related to the matrix multiplication algorithm that results, as known, in the worst case O ( n 3 ) .

5. Machine Learning-Based Fault Isolation and Estimation

The proposed directional residual-based FI and FE scheme can be compared with machine learning (ML) solutions. ML techniques are extensively applied to FD problems; an extensive literature exists on this issue [32,33]. In this study, ML structures with different complexity are built and compared using the same data used to identify and test the directional residual-based technique. Specifically, FI and FE are addressed separately. First, a ML classifier is used to estimate the faulty sensor index. Then, a second ML approximator is used to estimate the fault amplitude. The first is a typical classification problem, while the second is a typical regression problem. Classification and regression structures are built by exploiting the dedicated MATLAB toolboxes in more detail. For example, support vector machines (SVMs) [17,22], neural networks (NNs) [23,24], decision trees [17,24], and ensemble of decision trees [25] structures have been considered.

5.1. Dataset Preparation for ML Algorithms

In contrast to the primary residual-based FI techniques that are based only on fault-free data and on fault directions, the ML technique requires faulty data samples which are representative of each possible sensor fault for a wide range of fault amplitudes.
Since real sensor flight data with sensor faults are not easily available, additive faults are artificially injected on the fault-free data to simulate the occurrence of a sensor fault (this approach of generating artificial faulty data is widely used in the FDi community, see for instance [11,12,13,14,15,16]).
For this reason, a new ‘ad-hoc’ data set is produced based on the fault-free dataset used to identify the MARS models. This is performed by adding a random amplitude fault at each sampling time on a randomly selected sensor. Random amplitude fault on a random sensor is clearly not a realistic fault scenario; this approach is used only to generate a rich set of training data to promote generalization capacity and robustness in the ML schemes.
It is observed that this simple fault generation method is possible thanks to the fact that, in this study, our approach is “memoryless”. Indeed, the estimation at the time k depends only on other signals at the same time k ; therefore, serial temporal correlation of data has not been considered in the model. If the estimations at time k depend on signals at previous time instants ( k j ), the above faulty data generation procedure can be easily extended by considering data segments of appropriate length. Next, the fault-free mean and standard deviation of the data are normalized, and two new labels (signals) are added to the data. The first is the index identifying the sensor where the fault is injected, while the second is the normalized amplitude of the fault.

5.2. ML Classifier for FI

The FI classifiers input is the vector z F I ( k ) = [ x ( k ) , u ( k ) ] n x + n u of the current sensors and inputs measurements, and the corresponding output is the label of the faulty sensors S ( k ) = l [ 1 , 2 n x ] . The data set described in Section 6 is used for the training. The set of classifiers that are evaluated and the main design parameters are reported in Table 1.

5.3. ML Estimator for FE

The FE estimator input is the vector z F E ( k ) = [ x ( k ) , u ( k ) , S ( k ) ] n x + n u + 1 of the current sensors and inputs measurements plus the index of the faulty sensor. The corresponding output is the amplitude of the normalized fault f ( k ) injected on the fault-free measurements. The data set described in Section 6 is used for the training. The set of estimators and the main design parameters are shown in Table 1.

5.4. Online Operation of ML Algorithms

The previously trained ML structures are then used for FI and FE purposes in the online operation phase according to the scheme shown in Figure 1. Following the failure detection, the FI block processes the current input z F I ( k ) = [ x ( k ) , u ( k ) ] and provides the estimation of the sensor fault index S ^ ( k ) in the output. This information, as well as the measured current signals z F E ( k ) = [ x ( k ) , u ( k ) , S ^ ( k ) ] , is then processed by the FE block that provides the estimation f ^ ( k ) of fault amplitude at time k.
NOTE-3: The ML classification and regression schemes introduced in Section 5.2 and Section 5.3 have the same hyperparameters but differ in the inner architecture. For example, in the case of neural networks, the structure of the inner layers is the same, except the output layer, specifically in the case of a neural network classification, and the SoftMax activation function is applied. In contrast, linear activation functions are used in the regression neural network.

6. Semi-Autonomous Aircraft Flight Data

The FI and FE algorithms are designed and tested using flight data of a Tecnam P92 aircraft [34]. The data are acquired in a semi-autonomous mode; specifically, a pilot manually flew the plane during takeoff and approach/landing and flew autonomously at cruise conditions. A batch of six-flight datasets is considered in this study. The set of the 12 signals listed in Table 2 is considered [35], of which the first eight are the monitored sensors x ( k ) , while the last four are the actuation (input) signals u ( k ) . All the signals are normalized to zero mean and unitary variance.
Data from five flights (1 h and 21 min) are used to train models for a total of N = 48,661 data samples (with a data sampling period is 0.1 s). In contrast, the remaining batch of flight data (for a total of 11 min) is used for validation purposes for a total of N = 6659 samples.

7. Design of the Non-Linear Directional Residuals

7.1. Redundancy Relation Identification

As described in Section 4, the MARS algorithm is used to identify the non-linear additive models introduced in Section 2. The MARS algorithm automatically selects the number of piecewise cubic splines and weights for each model in (1), exploiting a forward–backward iterative approach. Table 3 reports the model identification results for each of the additive models. The rows represent the identified model structure for each of the n x monitored sensors; the columns indicate the number of cubic splines selected by the algorithm to characterize the functions g i , j ( x j ) and h i , j ( u j ) in (1). It is observed that the algorithm does not choose all the available signals for modeling, thus also providing a feature selection.
Next, the same data are used to identify the linear redundancy relations parameters in (16) for each of the eight sensors. In this case, the model parameters are determined using standard least squares.

7.2. Accuracy of the Identified Additive Models

The first step to evaluate the effectiveness of the non-linear models is to compare their modeling accuracy with those of the linear models to estimate the sensor signals.
For each sensor, Table 4 reports the mean and standard deviation of the approximation error e ( k ) = x ( k ) x ^ ( k ) for the training and the test datasets in the form [mean value ± standard deviation]. The analysis of the table reveals that the non-linear models perform better than the linear ones. In fact, although the mean value of the estimation error for non-linear models is larger in some cases, this is still very close to zero. In contrast, the standard deviation of the estimation error is lower for non-linear models than linear models for all monitored sensors for both “Training” and “Test” data.
The higher accuracy provided by the non-linear additive models is considered important from an FI perspective because more accurate models can detect smaller amplitude faults by limiting the number of false alarms caused by modeling errors that could be equivocated with the occurrence of faults.

7.3. Fault Sensitivity Matrix

This section reviews in detail the time-dependent fault sensitivity matrix W ¯ ( k ) to highlight the relevant difference between linear and non-linear FI schemes. For this purpose, the non-linear models generated by the MARS algorithm are linearized following the approach proposed in Section 2 to build the matrix of fault signatures (i.e., the fault sensitivity matrix).
In Figure 2, the time evolution of the local fault sensitivities produced by the training data under no-fault conditions for a time segment of 1 h and 21 min is shown. A blank cell in the figure means that the MARS algorithm has not selected that signal in the corresponding additive model. The first row shows the monitored sensor readings [ α , , θ ], while the other rows in the figure represent the time-dependent sensor fault direction at time k (cyan). In other words, the j -th column represents the evolution of the n x components of the vector w ¯ j ( k ) (see Section 2). The figure also reports in red the fault sensitivities for the linear model case (fault directions) that are constant over time. In the last case, the j -th column represents the constant n x components of the vector w j , as shown in Section 2.1. A detailed analysis of the data reveals that the time intervals where the local faults sensitivities change significantly are those in which the aircraft is executing a maneuver. The significant variation of the fault directions with time suggests a probable relevant effect on the FI performance compared to the linear case.

8. Design of Machine-Learning-Based FI Schemes

This section describes the design of ML-based FI schemes. First, the FI classifiers are trained. The input of the classifiers is the z F I ( k ) = [ x ( k ) , u ( k ) ] 12 vector of the eight monitored sensors and four inputs signals reported in Table 1. For each of the eight monitored sensors, single random amplitude faults are added to the fault-free signals at random time instants sampled from the training data. Positive and negative fault amplitudes are generated in the range A max A max , whose values are reported in Table 5. The corresponding out at time k is the label of the faulty sensors S ( k ) = l [ 1 , 2 8 ] . A total of N = 400 , 000 random samples are generated for the training. The training of the 16 FI classifiers (the first column in Table 6) is then performed using MATLAB. Table 6 shows the accuracy [36] of the models obtained from cross-validation of the training data with the n -fold validation taking n = 5 . The neural network FI classifiers achieved the highest accuracies compared to the other family of models; in particular, the wide neural network scores provided a 78.1%.
Next, the 16 FI estimators are trained. The input vector z F E ( k ) = [ x ( k ) , u ( k ) , S ( k ) ] 13 coincides with z F I ( k ) augmented with the index S ( k ) indicating the faulty sensor, and the output is the amplitude of the normalized fault amplitude f ( k ) . A total of N = 400,000 training samples are generated using the same procedure used for the training of the FI classifiers. The second column of Table 6 reports the root mean square error (RMSE), R M S E = 1 N i = 1 N f i f ^ i 2 , of the models achieved, cross-validating the training data with the n -fold method ( n = 5 ). Once again, the wide neural network provides the lowest RMSE compared to the other family of models.

9. Metrics for Validating Fault Diagnosis Schemes

For validation and comparison purposes, additive constant bias faults of amplitude A ( f ( k ) = A ) are considered. The constant fault is applied at time k = 1 and is maintained for the entire duration of the validation flight. The following FI and FE metrics are used.

9.1. Fault Isolation Percentage (FIP)

The FI performance is measured in terms of the fault isolation percentage (FIP), defined as:
F I P X ( A ) : Given a fault of amplitude A on the sensor X , the fault isolation percentage is the percent ratio between the number of samples the FI block which correctly isolates the faulty sensor and the number of samples in the validation flight.
F I P = #   n u m b e r   o f   c o r r e c t l y   i s o l a t e d   s a m p l e s #   n u m b e r   o f   f a u l t y   s a m p l e s
The index F I P X ( A ) is calculated for each considered technique, for different fault amplitude A and for each monitored sensor X . The rest of the paper F I P X will refer to the average of F I P X ( A ) for the considered fault amplitudes injected on the monitored sensor X , i.e., F I P X = m e a n [ F I P X ( A ) ] . Similarly, F I P ¯ will refer to the average of the F I P X values evaluated over the eight monitored sensors, that is: F I P ¯ = m e a n [ F I P X ] .

9.2. Fault Estimation Percentage (FEP)

The primary residuals in models (1) and (16) for the non-linear and linear models, respectively, allow direct estimation (see Equation (15)) of the fault amplitude A ^ that is computed as the difference between the measured and predicted signal. In contrast, the ML models estimate the fault amplitude A ^ as a regression problem (see Section 5.3). The fault estimation percentage (FEP) is defined as:
F E P X ( A ) : Given a fault amplitude A , the fault estimation percentage ratio is the absolute value of the percent ratio between the fault amplitude reconstruction. The F E P X ( A ) is calculated as the difference between the actual fault amplitude A and the mean of the reconstructed fault amplitude throughout the validation flight.
F E P X ( A ) = A 1 N i = 1 N A ^ i A
Furthermore, the index F E P X ( A ) is calculated for each technique, a different fault amplitude A , and a monitored sensor X . In addition, F E P X refers to the average of F E P X ( A ) for the different fault amplitudes injected on the monitored sensor X , while F E P ¯ refers to the average of F E P X evaluate over the eight monitored sensors.

9.3. Complementary Fault Estimation Percentage (cFEP)

In order to achieve a performance metric that is 100% when the FE performance is perfect and 0% when it is completely unsatisfactory, the complementary FEP (cFEP) is defined as:
c F E P = m a x ( 0 , 100 F E P )
From the c F E P index, the c F E P X ( A ) , c F E P X , and c F E P ¯ indices are derived. In addition, to produce an overall performance ranking that takes into account both the FI and FE performance, the overall performance index J t o t is defined:
J t o t = F I P ¯ + c F E P ¯ 2
Perfect performance is archived when J t o t = 100 , i.e., in the case of perfect fault isolation and perfect fault reconstruction.

10. Comparison between Directional Residual and Machine Learning Techniques

This section compares the fault diagnosis performance provided by the directional residuals and machine-learning-based methods. This study is performed using the data of the validation flight. Positive and negative constant faults are added to the fault-free data, considering, for each sensor, fault amplitudes A equal to ±17%, ±33%, ±50%, ±67%, ±83%, and ±100% of the maximum fault amplitude taken from Table 5. The faults are added at time k = 1 to the faulty sensor and maintained constant for the entire flight duration. This procedure is repeated for all eight sensors and all the considered fault amplitudes, resulting in a total of about N = 640,000 validation samples. The mean performance for all the sensors and fault amplitudes is evaluated using F I P ¯ , F E P ¯ , and c F E P ¯ indices already in Section 9. The results are reported in the first two columns of Table 7.
It is observed that the non-Linear technique (NL-DR) provides 71% in terms of F I P ¯ . Although satisfactory, it is also observed that medium neural network (M-NN) and the wide neural network (W-NN) methods perform slightly better. On the other side, considering the F E P ¯ performance, the NL-DR achieves an excellent 18% while M-NN and the W-NN provides a significant performance degradation equal to 35%.
These facts are relevant because the NL-DR provides a high-level FI performance while maintaining an excellent capability for fault reconstruction. This fact does not apply to any one of the 16 ML techniques that provide a c F E P ¯ performance lower than 72%.
In summary, the best resulting method in terms of FI performance is the W-NN, and the worst is the F-SVM. Considering the FE performance, the best c F E P ¯ is provided by our proposed NL-DR method, and the worst is the L-SVM.

10.1. Overall Performance Comparison

The fourth column of Table 7 reports the index J t o t for all the techniques. It is now evident that the resulting method with the best overall performance index is given by the proposed NL-DR (77%), followed by 3-NN (71%) and M-NN (69%) and by 2-NN (69%).
The last column of Table 7 also shows the combined memory occupancy of the isolation and estimation models. The SVM models have the highest memory occupation, up to about 50 MB, while the most parsimonious architectures are those based on directional residuals and neural networks.

10.2. In-Depth Performance Comparison of the Best Techniques

This section shows the F I P X and c F E P X indices for the best performing techniques explicitly evaluated for the eight monitored sensors. Figure 3 shows that the F-DT approach is biased toward the α sensor at the expense of the others. It perfectly isolates the faults on the α sensor, but it cannot correctly isolate any fault occurring on β , T a S , P , and θ sensors. It is also observed that the NL-DR technique performs better than the other techniques for four of the eight monitored sensors.
In Figure 4, it can be deduced that ML techniques have much lower FE performance than those provided by the NL-DR method for all the sensors. In fact, the proposed NL-DR scheme performs significantly better than the others for six of the eight monitored sensors.

10.3. Performance Comparison Evaluated over a Wider Fault Range

The performance of any machine learning technique is strongly influenced by the data used for training. This implies that the available set training data strongly influences ML-based FIE methods in our specific case. In contrast, the proposed residual-based approach is virtually independent from the fault amplitude. In fact, in the primary residual modelling phase, no assumption is made about the magnitude of the faults, which leads to significant benefits in case the range of potential faults is under- or over-estimated. A clear example of this potential problem can be observed in Figure 5, where the response of F I P X ( A ) and F E P X ( A ) indices for T a S and ϕ sensors are compared for NL-DR and neural-network-based methods.
In the upper part of Figure 5, it can be observed that by injecting faults with amplitudes twice than the nominal ranges used for training (nominal range: [ 2 2 ] m/s for the T a S sensor and [ 6 6 ] ° for the ϕ sensor), the FI performance of the 3-NN deteriorates quickly with the increase in the fault amplitude.
Considering a failure on T a S equal to 2 m/s, the F I P T a S ( 2 ) index for the 3-NN is 90% (Point A), while a failure of 4 m/s the F I P T a S ( 4 ) descends to 2% (Point B). On the other hand, using the NL-DR technique, the F I P T a S ( A ) performance constantly increases with the fault amplitude (Points C and D), avoiding the paradox response provided by the ML methods. Similar considerations apply to the F E P ϕ ( A ) index trend for faults on sensor ϕ .
In terms of FE performance, it is observed that both the T a S and ϕ the NL-DR approaches provide a suitable monotone decreasing trend for the F E P X ( A ) index with an increase in the failure amplitude. At the same time, there is a rapid degradation of the F E P X ( A ) performance outside the nominal training range in the NN case. Specifically, a constant 5 ° failure on ϕ produces a F E P ϕ ( 5 ) of 3% (Point E), while a failure of 10° produces a F E P ϕ ( 10 ) of 35% (Point F). Using the NL-DR approach, the F E P ϕ ( A ) index increases from 6% for a failure of 5 ° (Point G) to 2% in the case of a failure of 10 ° (Point H). The above issue is a typical effect known as “Unseen Data Problem” or “generalization problem”. It refers to the inability to make reliable predictions in regions outside those explored in the training data.
In our study, a simple method to limit this important problem is to provide the ML algorithms with a broader range of fault amplitudes to cover the unexplored regions in the training phase. However, on the other side, the excessive widening of the fault ranges used in the training data can lead to the inability to discriminate accurately small faults.
A direct example of this can be observed in Figure 6, where the tri-layer neural network is retrained (3-NN2), considering a range of fault amplitudes which are twice the size of the nominal range, as shown in Table 5. Using the retrained network, a relevant increase in the FI performance is achieved for large fault amplitudes, but at the expense of performance degradation for medium and small amplitude faults.
In more detail, consider the case of a failure of 4 m/s on T a S . In this case, the F I P T a S ( 4 ) provided by the retrained neural network (3-NN2) is now 95% (Point I), which is 93% more than the previous network (3-NN). Vice versa, in the case of a fault of −1 m/s, the 3-NN isolates the fault with an accuracy of 77% (Point J) while the 3-NN2 of 25% (Point K) confirms what is previously conjectured. Moreover, by analyzing the F E P ϕ ( A ) index for the sensor ϕ , a similar conclusion can be drawn: that the F E P ϕ ( A ) for 10 ° goes from 35% for 3-NN (Point F) to 9% for 3-NN2 (Point L), while a failure of 2 ° goes from 8% for 3-NN (Point M) to 60% for 3-NN2 (Point N), as shown in Figure 5 and Figure 6.

10.4. Overall Performance Evaluation for All the Monitored Sensors

To better compare the overall performance, Table 8, Table 9, Table 10, Table 11, Table 12, Table 13, Table 14 and Table 15 report, for each monitored sensor, the percentage ratio between the actual area below the F I P X ( A ) function (as those in Figure 5 and Figure 6) and the perfect performance area ( 4 A max 100 % ). The ideal area ratio is obviously 100%. The same process is also applied to the c F E P X ( A ) functions. The last column of the tables reports the mean between the F I P X ( A ) area ratio and the c F E P X ( A ) area ratio.
For almost all sensors, the area under the curve generated by 3-NN2 is larger than that under the curve generated by 3-NN, indicating a generalized performance improvement. However, in most cases, the performance improvement is only achieved for large amplitude faults, at the expense of performance degradation for small amplitude faults. This fact indicates the need to find a compromise on the magnitude of the faults used to train the neural network. This issue substantially limits the applicability of ML techniques, especially in a real-world context where an ‘a priori’ knowledge of the fault amplitude range cannot be established. Moreover, analyzing the results obtained for all the monitored sensors, the performance of the proposed NL-DR technique is almost always better than both the 3-NN and 3-NN2 techniques.

10.5. Time-Domain Performance Comparison with Same FIP/cFEP

This section evaluates and compares time domain FI and FE responses for the NL-DR and 3-NN2 techniques. In order to achieve a meaningful comparison, the tests are performed by selecting faults whose amplitudes are such that the two methods provide the same value for the F I P X ( A ) or the c F E P X ( A ) indices in Figure 5 and Figure 6, resulting in the selection of a fault of the amplitude of 2.5 m/s on T a S , and of 4 ° on ϕ , respectively (as expected, the faults are injected at k = 1 and the constants for the whole flight are maintained).
Figure 7 shows, for a failure on TaS, the evolution of this signal. The green portions indicate the instants in which the FI is correct, while the red portions indicate when the failure is incorrectly attributed to a ‘wrong’ sensor. The upper plot refers to the NL-DR technique, while the lower plot refers to the 3-NN2 technique. For hypothesis, both methods isolate the fault with the same percentage (82%, Point O in Figure 6); the remarkable aspect is that the zones of wrong isolation are practically the same for the two techniques.
On the other side, there is a marked difference in F E P X ( A ) performance between the two techniques, see Figure 8. The fault amplitude estimated by the NL-DR method is much closer to the true value than the estimate provided by the 3-NN2 technique, as deduced from Figure 6 in Points P and Q, respectively.
A similar analysis is then performed for the fault on the sensor ϕ . In this case, however, the fault amplitude is selected to be 4°, i.e., the point R of Figure 6 where the F E P X ( A ) index is equal to 6% for the two techniques. Figure 9 shows the evolution of the estimation of the fault amplitude for the two techniques, where it is confirmed that the performances are equivalent for all practical purposes. On the other side, from Figure 10, it can be observed that the F I P X ( A ) performance of the NL-DR technique is significantly better than that of the 3-NN2 technique (Points S and T in Figure 6).

11. Conclusions

The main purpose of the research effort described in this paper was to compare data-driven non-linear directional residual and machine-learning-based fault diagnosis techniques. The experimental study showed that the method based on primary residuals is virtually independent of the fault amplitude. Additionally, it was demonstrated that the FI and FE performance increases monotonically with increasing fault amplitude. In contrast, the performance of ML-based techniques depends heavily on the fault amplitudes used during training, producing potentially unpredictable results in regions not covered in the training phase. A partial solution to this problem was obtained by retraining the ML models using larger ranges for the faults injected in the training phase. The overall effect is that the FI and FE performance increases for large faults but, unfortunately, at the expense of a decrease in the estimation accuracy of small amplitude faults. In summary, it can be concluded that, while from the perspective of FI, the performance of residual-based and ML techniques is essentially equivalent, the residual-based approach results are more accurate and reliable than the ML-based approaches from the perspective of FE.

Author Contributions

Conceptualization, N.C., M.R.N. and M.L.F.; methodology, M.R.N. and M.L.F.; software, N.C.; validation, N.C.; formal analysis, N.C. and P.V.; investigation, N.C. and G.C.; resources, M.L.F.; data curation, F.C., M.R.N. and M.L.F.; writing—original draft preparation, N.C.; writing—review and editing, F.C., M.R.N., P.V. and M.L.F.; visualization, N.C. and G.C.; supervision, M.R.N. and M.L.F.; project administration, M.L.F. and P.V.; funding acquisition, G.C., M.L.F. and P.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by the University of Perugia through the 2019 and 2020 Basic Research Funds (Projects: RICBA19MLF and RICBA20MF).

Acknowledgments

The authors would like to thank Giuseppe Del Core for providing the P92 Tecnam flight data.

Conflicts of Interest

The authors declare no conflict of interest in performing this research.

Abbreviations

The following abbreviations are used in this manuscript:
ARAnalytical redundancy
cFEPComplementary fault estimation percentage
DDData-driven
FDFault detection
FDiFault diagnosis
FEFault estimation
FEPFault estimation percentage
FIFault isolation
FIPFault isolation percentage
GAMGeneralized additive models
MARSMultivariate adaptive regression splines
MLMachine learning
NLNon-linear
NNNeural network
ReLURectified linear unit
RMSERoot mean square error
SVMSupport vector machine

References

  1. Cartocci, N.; Monarca, A.; Costante, G.; Fravolini, M.L.; Dogan, K.M.; Yucelen, T. Linear Control of a Nonlinear Aerospace System via Extended Dynamic Mode Decomposition. In Proceedings of the AIAA Scitech 2022 Forum, San Diego, CA, USA, 3–7 January 2022. [Google Scholar]
  2. Isermann, R. Fault-Diagnosis Systems: An Introduction from Fault Detection to Fault Tolerance; Springer: Berlin/Heidelberg, Germany, 2006; ISBN 3540241124. [Google Scholar]
  3. Isermann, R. Supervision, Fault-Detection and Fault-Diagnosis Methods—An Introduction. Control. Eng. Pract. 1997, 5, 639–652. [Google Scholar] [CrossRef]
  4. Isermann, R. Process Fault Detection Based on Modeling and Estimation Methods-A Survey. Automatica 1984, 20, 387–404. [Google Scholar] [CrossRef]
  5. Basseville, M. Information Criteria for Residual Generation and Fault Detection and Isolation. Automatica 1997, 33, 783–803. [Google Scholar] [CrossRef] [Green Version]
  6. Basseville, M.; Nikiforov, I.V. Detection of Abrupt Changes: Theory and Application; Prentice Hall Englewood Cliffs: Hoboken, NJ, USA, 1993; Volume 104. [Google Scholar]
  7. Basseville, M. Detecting Changes in Signals and Systems-A Survey. Automatica 1988, 24, 309–326. [Google Scholar] [CrossRef]
  8. Gertler, J.J. Fault Detection and Diagnosis in Engineering Systems; CRC Press: Boca Raton, FL, USA, 2017; ISBN 9781351448796. [Google Scholar]
  9. Gertler, J.J.; Kunwer, M.M. Optimal Residual Decoupling for Robust Fault Diagnosis. Int. J. Control 1995, 61, 395–421. [Google Scholar] [CrossRef]
  10. Gertler, J. Analytical Redundancy Methods in Fault Detection and Isolation—Survey and Synthesis. IFAC Proc. Vol. 1991, 24, 9–21. [Google Scholar] [CrossRef]
  11. Cartocci, N.; Costante, G.; Napolitano, M.R.; Valigi, P.; Crocetti, F.; Fravolini, M.L. PCA Methods and Evidence Based Filtering for Robust Aircraft Sensor Fault Diagnosis. In Proceedings of the 2020 28th Mediterranean Conference on Control and Automation, MED 2020, Saint-Raphaël, France, 15–18 September 2020; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2020; pp. 550–555. [Google Scholar]
  12. Cartocci, N.; Napolitano, M.R.; Costante, G.; Crocetti, F.; Valigi, P.; Fravolini, M.L. A Robust Data-Driven Fault Diagnosis Scheme Based on Recursive Dempster-Shafer Combination Rule. In Proceedings of the 2021 29th Mediterranean Conference on Control and Automation, MED 2021, Puglia, Italy, 22–25 June 2021; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2021; pp. 1070–1075. [Google Scholar]
  13. Cartocci, N.; Napolitano, M.R.; Costante, G.; Fravolini, M.L. A Comprehensive Case Study of Data-Driven Methods for Robust Aircraft Sensor Fault Isolation. Sensors 2021, 21, 1645. [Google Scholar] [CrossRef]
  14. Cartocci, N.; Crocetti, F.; Costante, G.; Valigi, P.; Fravolini, M.L. Robust Multiple Fault Isolation Based on Partial-Orthogonality Criteria. Int. J. Control Autom. Syst. 2021. [Google Scholar]
  15. Cartocci, N.; Napolitano, M.R.; Costante, G.; Valigi, P.; Fravolini, M.L. Aircraft Robust Data-Driven Multiple Sensor Fault Diagnosis Based on Optimality Criteria", Mechanical Systems and Signal Processing. Mech. Syst. Signal Processing 2021, 170, 108668. [Google Scholar] [CrossRef]
  16. Cartocci, N.; Crocetti, F.; Costante, G.; Valigi, P.; Napolitano, M.R.; Fravolini, M.L. Data-Driven Sensor Fault Diagnosis Based on Nonlinear Additive Models and Local Fault Sensitivity. In Proceedings of the 2021 20th International Conference on Advanced Robotics (ICAR), Ljubljana, Slovenia, 6–10 December 2021; pp. 750–756. [Google Scholar] [CrossRef]
  17. Jung, M.; Niculita, O.; Skaf, Z. Comparison of Different Classification Algorithms for Fault Detection and Fault Isolation in Complex Systems. Procedia Manuf. 2018, 19, 111–118. [Google Scholar] [CrossRef]
  18. Jung, D.; Ng, K.Y.; Frisk, E.; Krysander, M. Combining Model-Based Diagnosis and Data-Driven Anomaly Classifiers for Fault Isolation. Control Eng. Pract. 2018, 80, 146–156. [Google Scholar] [CrossRef]
  19. Yan, Z.; Yao, Y.; Huang, T.B.; Wong, Y.S. Reconstruction-Based Multivariate Process Fault Isolation Using Bayesian Lasso. Ind. Eng. Chem. Res. 2018, 57, 9779–9787. [Google Scholar] [CrossRef]
  20. Kim, S.B.; Sukchotrat, T.; Park, S.K. A Nonparametric Fault Isolation Approach through One-Class Classification Algorithms. IIE Trans. 2011, 43, 505–517. [Google Scholar] [CrossRef]
  21. Yan, J.; Lee, J. Degradation Assessment and Fault Modes Classification Using Logistic Regression. J. Manuf. Sci. Eng. 2005, 127, 912–914. [Google Scholar] [CrossRef]
  22. Erfani, S.M.; Rajasegarar, S.; Karunasekera, S.; Leckie, C. High-Dimensional and Large-Scale Anomaly Detection Using a Linear One-Class SVM with Deep Learning. Pattern Recognit. 2016, 58, 121–134. [Google Scholar] [CrossRef]
  23. Revathi, A.R.; Kumar, D. An Efficient System for Anomaly Detection Using Deep Learning Classifier. Signal Image Video Processing 2017, 11, 291–299. [Google Scholar] [CrossRef]
  24. Pashazadeh, V.; Salmasi, F.R.; Araabi, B.N. Data Driven Sensor and Actuator Fault Detection and Isolation in Wind Turbine Using Classifier Fusion. Renew. Energy 2018, 116, 99–106. [Google Scholar] [CrossRef]
  25. Mousavi, M.; Moradi, M.; Chaibakhsh, A.; Kordestani, M.; Saif, M. Ensemble-Based Fault Detection and Isolation of an Industrial Gas Turbine. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Toronto, ON, Canada, 11–14 October 2020; pp. 2351–2358. [Google Scholar] [CrossRef]
  26. Hastie, T.J.; Tibshirani, R.J. Generalized Additive Models; Routledge: New York, NY, USA, 2017; Volume 4, pp. 1–335. ISBN 9781351445979. [Google Scholar]
  27. Łabęda, Z.M. Additive Model Applications for the Fault Detection of Actuators. Pomiary Autom. Kontrola 2009, 55, 152–155. [Google Scholar]
  28. Friedman, J.H. Multivariate Adaptive Regression Splines. Ann. Stat. 1991, 19, 1–67. [Google Scholar] [CrossRef]
  29. Abid, A.; Khan, M.T.; Iqbal, J. A Review on Fault Detection and Diagnosis Techniques: Basics and Beyond. Artif. Intell. Rev. 2020, 54, 3639–3664. [Google Scholar] [CrossRef]
  30. Datta, S.; Sarkar, S. A Review on Different Pipeline Fault Detection Methods. J. Loss Prev. Process Ind. 2016, 41, 97–106. [Google Scholar] [CrossRef]
  31. Gints Jēkabsons ARESLab: Adaptive Regression Splines Toolbox for Matlab/Octave. Available online: http://www.cs.rtu.lv/jekabsons/regression.html (accessed on 20 February 2022).
  32. Jan, S.U.; Lee, Y.D.; Koo, I.S. A Distributed Sensor-Fault Detection and Diagnosis Framework Using Machine Learning. Inf. Sci. 2021, 547, 777–796. [Google Scholar] [CrossRef]
  33. Lo, N.G.; Flaus, J.M.; Adrot, O. Review of Machine Learning Approaches in Fault Diagnosis Applied to IoT Systems. In Proceedings of the 2019 International Conference on Control, Automation and Diagnosis, ICCAD 2019, Grenoble, France, 2–4 July 2019. [Google Scholar] [CrossRef]
  34. Tecnam P92. Available online: https://www.tecnam.com/aircraft/p92-echo-mkii/ (accessed on 20 February 2022).
  35. Campa, G.; Gu, Y.; Seanor, B.; Napolitano, M.R.; Pollini, L.; Fravolini, M.L. Design and Flight-Testing of Non-Linear Formation Control Laws. Control Eng. Pract. 2007, 15, 1077–1092. [Google Scholar] [CrossRef]
  36. Grandini, M.; Bagli, E.; Visani, G. Metrics for Multi-Class Classification: An Overview. arXiv 2020, arXiv:2008.05756. [Google Scholar] [CrossRef]
Figure 1. Online operation of the machine-learning-based FI and FE.
Figure 1. Online operation of the machine-learning-based FI and FE.
Sensors 22 02635 g001
Figure 2. The first row reports the measured readings for the 8 monitored sensors. The cyan signals show the evolution of the (8 × 8) fault sensitivity matrix elements, displayed in Section 2. The red lines show the constant value of the elements of the fault sensitivity matrix W , shown in Section 2.1.
Figure 2. The first row reports the measured readings for the 8 monitored sensors. The cyan signals show the evolution of the (8 × 8) fault sensitivity matrix elements, displayed in Section 2. The red lines show the constant value of the elements of the fault sensitivity matrix W , shown in Section 2.1.
Sensors 22 02635 g002
Figure 3. Fault isolation percentage for the best techniques for the 8 monitored sensors.
Figure 3. Fault isolation percentage for the best techniques for the 8 monitored sensors.
Sensors 22 02635 g003
Figure 4. Fault estimation percentage for the best techniques for the 8 monitored sensors.
Figure 4. Fault estimation percentage for the best techniques for the 8 monitored sensors.
Sensors 22 02635 g004
Figure 5. Fault diagnosis performance for TaS and φ sensors. The blue line is the NL-DR and the yellow line is the 3-NN trained with fault amplitude in the nominal ranges of Table 6.
Figure 5. Fault diagnosis performance for TaS and φ sensors. The blue line is the NL-DR and the yellow line is the 3-NN trained with fault amplitude in the nominal ranges of Table 6.
Sensors 22 02635 g005
Figure 6. Fault diagnosis performance for TaS and φ sensors. The blue line is the NL-DR technique, and the amaranth line is the 3-NN2 trained in twice the fault range summarized in Table 6.
Figure 6. Fault diagnosis performance for TaS and φ sensors. The blue line is the NL-DR technique, and the amaranth line is the 3-NN2 trained in twice the fault range summarized in Table 6.
Sensors 22 02635 g006
Figure 7. TaS sensor FI performance for a fault amplitude of 2.5 m/s in the test flight. The green segments indicate when the fault on the TaS sensor is correctly isolated by the FI scheme, while the red segments indicate a wrong FI.
Figure 7. TaS sensor FI performance for a fault amplitude of 2.5 m/s in the test flight. The green segments indicate when the fault on the TaS sensor is correctly isolated by the FI scheme, while the red segments indicate a wrong FI.
Sensors 22 02635 g007
Figure 8. TaS sensor FE performance for a fault amplitude of 2.5 m/s (green line) during the test flight. The blue signal indicates the fault amplitude estimated by the NL-DR technique, while the amaranth signal indicates the fault amplitude calculated by the 3-NN2.
Figure 8. TaS sensor FE performance for a fault amplitude of 2.5 m/s (green line) during the test flight. The blue signal indicates the fault amplitude estimated by the NL-DR technique, while the amaranth signal indicates the fault amplitude calculated by the 3-NN2.
Sensors 22 02635 g008
Figure 9. φ sensor FE performance for a fault amplitude of 4° (green line) during the test flight. The blue signal indicates the fault amplitude estimated by the NL-DR technique, while the amaranth signal indicates the fault amplitude provided by the 3-NN2.
Figure 9. φ sensor FE performance for a fault amplitude of 4° (green line) during the test flight. The blue signal indicates the fault amplitude estimated by the NL-DR technique, while the amaranth signal indicates the fault amplitude provided by the 3-NN2.
Sensors 22 02635 g009
Figure 10. φ sensor FI performance for a fault amplitude of 4° during the test flight. The green segments indicate when the fault on the φ sensor is correctly isolated by the FI scheme, while the red segments indicate a wrong FI.
Figure 10. φ sensor FI performance for a fault amplitude of 4° during the test flight. The green segments indicate when the fault on the φ sensor is correctly isolated by the FI scheme, while the red segments indicate a wrong FI.
Sensors 22 02635 g010
Table 1. ML techniques for sensor fault isolation/estimation.
Table 1. ML techniques for sensor fault isolation/estimation.
NameHyperparameters
SVM
Linear SVM kernel function: linear; kernel scale: automatic
Quadratic SVM kernel function: quadratic; kernel scale: automatic
Cubic SVM kernel function: cubic; kernel scale: automatic
Fine Gaussian SVM kernel function: gaussian; kernel scale: 0.9
Medium Gaussian SVM kernel function: gaussian; kernel scale: 3.6
Coarse Gaussian SVM kernel function: gaussian; kernel scale: 14
Neural Network
Narrow Neural Network number of fully connected layers: 1, first layer size: 10
Medium Neural Network number of fully connected layers: 1, first layer size: 25
Wide Neural Network number of fully connected layers: 1, first layer size: 100
Bi-layer Neural Network number of fully connected layers: 2, first layer size: 10; second layer size: 10.
Tri-layer Neural Network number of fully connected layers: 3, first layer 10: second layer 10: third layer: 10
Decision Tree
Fine Tree maximum number of splits: 100
Medium Tree maximum number of splits: 20
Coarse Tree maximum number of splits: 4
Ensemble Decision Tree
Bagged Trees ensemble method: bag, maximum number of splits: 38935, number of learners: 30
Boosted Trees ensemble method: AdaBoost, maximum number of splits: 20, number of learners: 30, learning rate: 0.1
Table 2. Aircraft signals.
Table 2. Aircraft signals.
x(k) (Monitored Sensors)u(k) (Input Commands)
αAngle of attackPRoll speedAltAltitude
βDrifting angleQPitch speedAieAileron
TaSTrue AirSpeedRYaw speedRudRudder
φRoll angleθPitch angleThrThrust lever
Table 3. Additive model identification (the number of cubic splines selected by MARS algorithm).
Table 3. Additive model identification (the number of cubic splines selected by MARS algorithm).
αβTaSPQRφθAltAieRudThrTot.
α -213222-11216
β2 -2-122332118
TaS6- 12-22-22-17
P222 223-231-19
Q3-22 25-1--217
R11-2- 21323217
φ-22-15 123--16
θ2-2-133 --2417
Table 4. Sensors’ estimation accuracy [mean ± standard deviation].
Table 4. Sensors’ estimation accuracy [mean ± standard deviation].
α [°]β [°]TaS [m/s]P [°/s]Q [°/s]R [°/s]φ [°]θ [°]
L-Tr−10−14 ± 1.110−16 ± 1.6−10−14 ± 1.110−14 ± 1.510−14 ± 1.7−10−14 ± 1.210−15 ± 1.2−10−14 ± 1.4
N-Tr10−5 ± 0.210−4 ± 0.5−10−4 ± 0.2−10−3 ± 0.610−3 ± 0.6−10−4 ± 0.310−4 ± 0.310−4 ± 0.5
L-Te−0.3 ± 0.2−0.8 ± 0.7−0.4 ± 0.410−2 ± 0.5−10−4 ± 0.60.2 ± 0.40.1 ± 0.3−0.1 ± 0.9
N-Te−0.1 ± 0.1−10−2 ± 0.5−0.1 ± 0.110−2 ± 0.20.1 ± 0.2−0.1 ± 0.2−10−2 ± 0.1−0.2 ± 0.3
(L: linear, N: non-linear; Tr: train data; Te: test data).
Table 5. Maximum fault amplitude.
Table 5. Maximum fault amplitude.
α [°]β [°]TaS [m/s]P [°/s]Q [°/s]R [°/s]φ [°]θ [°]
A max 23232664
Table 6. Fault isolation and fault estimation performance evaluated on training data.
Table 6. Fault isolation and fault estimation performance evaluated on training data.
NameAccuracy [%] {Fault Isolation}RMSE {Faut Estimation}
SVM
Linear SVM34.10.77484
Quadratic SVM67.50.7015
Cubic SVM68.10.59263
Fine Gaussian SVM590.58138
Medium Gaussian SVM580.51841
Coarse Gaussian SVM45.20.6992
Neural Network
Narrow Neural Network67.90.4875
Medium Neural Network75.40.40438
Wide Neural Network78.10.38077
Bi-layer Neural Network71.20.41589
Tri-layer Neural Network71.50.39589
Decision Tree
Fine Tree48.70.58246
Medium Tree450.56079
Coarse Tree40.90.57909
Ensemble Decision Tree
Bagged Trees46.50.4737
Boosted Trees42.50.66543
Table 7. Performance comparison between directional residual-based and machine-learning-based techniques.
Table 7. Performance comparison between directional residual-based and machine-learning-based techniques.
F I P ¯   [ % ] F E P ¯   [ % ] c F E P ¯   [ % ] J t o t   [ % ] Memory Occup. (KB)
DRDirectional Residuals
Lin-DRLinear643367661
NL-DRNon-Linear7118827710
SVMSVM
L-SVM Linear SVM 258812198000
Q-SVM Quadratic SVM 6276244345,000
Cu-SVM Cubic SVM 6262385045,000
F-SVM Fine Gaussian SVM 6148525760,000
M-SVMMedium Gaussian SVM 5553475155 000
Co-SVM Coarse Gaussian SVM 1677232070 000
NNNeural Network
1-NN Narrow Neural Network 6147535725
M-NN Medium Neural Network 7335656940
W-NN Wide Neural Network 76425867100
2-NN Bi-layer Neural Network 6830706930
3-NN Tri-layer Neural Network 6928727140
DTDecision Tree
F-DT Fine Tree 363862491500
M-DT Medium Tree 31346649600
C-DT Coarse Tree 24406042200
ENEnsemble Decision Tree
Ba-EN Bagged Trees 70356568100,000
Bo-EN Boosted Trees 32752529500
Table 8. α area ratio.
Table 8. α area ratio.
NameFIP [%]cFEP [%]Mean
NL-DR808181
3-NN 856776
3-NN2855771
Table 9. β area ratio.
Table 9. β area ratio.
NameFIP [%]cFEP [%]Mean
NL-DR889994
3-NN 775566
3-NN2886376
Table 10. TaS area ratio.
Table 10. TaS area ratio.
NameFIP [%]cFEP [%]Mean
NL-DR727574
3-NN 455550
3-NN2561938
Table 11. P area ratio.
Table 11. P area ratio.
NameFIP [%]cFEP [%]Mean
NL-DR759183
3-NN 557063
3-NN2707572
Table 12. Q area ratio.
Table 12. Q area ratio.
NameFIP [%]cFEP [%]Mean
NL-DR838182
3-NN 836172
3-NN2808482
Table 13. R area ratio.
Table 13. R area ratio.
NameFIP [%]cFEP [%]Mean
NL-DR777878
3-NN 897682
3-NN2948790
Table 14. φ area ratio.
Table 14. φ area ratio.
NameFIP [%]cFEP [%]Mean
NL-DR778983
3-NN 706467
3-NN2738479
Table 15. θ area ratio.
Table 15. θ area ratio.
NameFIP [%]cFEP [%]Mean
NL-DR867882
3-NN 806170
3-NN2837780
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Cartocci, N.; Napolitano, M.R.; Crocetti, F.; Costante, G.; Valigi, P.; Fravolini, M.L. Data-Driven Fault Diagnosis Techniques: Non-Linear Directional Residual vs. Machine-Learning-Based Methods. Sensors 2022, 22, 2635. https://doi.org/10.3390/s22072635

AMA Style

Cartocci N, Napolitano MR, Crocetti F, Costante G, Valigi P, Fravolini ML. Data-Driven Fault Diagnosis Techniques: Non-Linear Directional Residual vs. Machine-Learning-Based Methods. Sensors. 2022; 22(7):2635. https://doi.org/10.3390/s22072635

Chicago/Turabian Style

Cartocci, Nicholas, Marcello R. Napolitano, Francesco Crocetti, Gabriele Costante, Paolo Valigi, and Mario L. Fravolini. 2022. "Data-Driven Fault Diagnosis Techniques: Non-Linear Directional Residual vs. Machine-Learning-Based Methods" Sensors 22, no. 7: 2635. https://doi.org/10.3390/s22072635

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop