A Sparse Autoencoder-Based Unsupervised Scheme for Pump Fault Detection and Isolation

Pumps are one of the most critical machines in the petrochemical process. Condition monitoring of such parts and detecting faults at an early stage are crucial for reducing downtime in the production line and improving plant safety, efficiency and reliability. This paper develops a fault detection and isolation scheme based on an unsupervised machine learning method, sparse autoencoder (SAE), and evaluates the model on industrial multivariate data. The Mahalanobis distance (MD) is employed to calculate the statistical difference of the residual outputs between monitoring and normal states and is used as a system-wide health indicator. Furthermore, fault isolation is achieved by a reconstruction-based two-dimensional contribution map, in which the variables with larger contributions are responsible for the detected fault. To demonstrate the effectiveness of the proposed scheme, two case studies are carried out based on a multivariate data set from a pump system in an oil and petrochemical factory. The classical principal component analysis (PCA) method is compared with the proposed method and results show that SAE performs better in terms of fault detection than PCA, and can effectively isolate the abnormal variables, which can hence help effectively trace the root cause of the detected fault.


Introduction
As an important component in the oil and petrochemical industry, pumps are widely used in different sectors, including the production line, transportation process and refinery factory. According to investigations in [1], pumps are responsible for the most equipment failures in a petrochemical plant. A minor fault in a pump can gradually develop to a severe one, which threatens the safety and productivity of plants. Therefore, detecting faults in pumps at an early stage is of great importance for engineers and managers.
The fault detection of pumps belongs to a "process monitoring and fault detection" (PMFD) field [2,3]. The genesis of the PMFD strategies can be traced back to the advent of the basic Shewhart control charts in the 1930s [4]. The control chart has been successfully used to date for monitoring a single variable. They are most effective for detecting large shifts in the measured variables which can indicate the health condition of the process or equipment being monitored; however, they are insensitive to smaller shifts, which may indicate an incipient fault in the concerned process or the equipment [5]. This led to the development of the improved control charts (i.e., cumulative sum control chart and exponentially weighted moving average control chart) in the 1950s, as well as model-based and data-based approaches.

1.
Based on the literature review, apart from conventional fault detection models (Shewhart control charts and its variants, principal component analysis (PCA), etc.), state-of-the-art fault detection methods (i.e., multi-layer perceptron neural network and fuzzy techniques system [11], binary adaptive resonance network [12], convolutional neural networks [13], SAE, etc.) worked well on simulated and experimental data, lacking tests in real industrial applications.

2.
Most of the aforementioned references detect faults on specific components or fault types, lack a global health indicator from a system-wide perspective. Currently, there is a general trend to construct a system-wide health monitoring system using multivariate industrial field data. 3.
During the operation of modern industrial machines, a huge volume of data is collected and archived by the monitoring systems. However, most data belong to normal operating conditions, while faulty data are usually rare and sometimes cannot be obtained. Instead of classification model, it requires an anomaly detection models that are able to take full advantage of the large volume of healthy data and effectively detect the fault.

4.
Again, based on the literature reviews [5, 18,19] and input from our sponsor (Shell), conventional fault detection systems in process industries have the problem of insensitive to incipient faults or generating a large numbers of false alarms, which may incur high maintenance costs and reduce the availability of the equipment. Therefore, there is a requirement for finding an effective fault detection model with lower false alarm rates and higher fault detection rate. Meanwhile, it is necessary to identify the most relevant variables that are strongly related to the detected fault, because this fault isolation can help determine the root cause of the fault and provide decision support for operators to timely adjust pump operation and take maintenance actions if necessary.
Among the Artificial Intelligence (AI)-based fault detection approaches, unsupervised machine learning methods are of great interest. In the strategy, an anomaly/fault detection model is trained using data collected when the system is in a healthy state. When new data come in, it is compared to the model's prediction and an alarm is raised to indicate an anomaly condition if an intolerable deviation is found. Principal component analysis (PCA) is one of the widely used unsupervised machine learning methods. It projects high dimensional data to a lower dimensional subspace to distinguish "normal" data from "abnormal" data. PCA has been widely applied in process control [7], fault detection and system risk assessment [20], network traffic anomaly detection [21], network intrusion detection [22], etc. More recently, autoencoder (AE) and its variants [23], which can automatically learn features from unlabeled data, are gaining popularity for anomaly detection. An autoencoder is a special type of neural networks that includes an encoder and decoder. The encoder layer(s) has similar principles as PCA, but with a slight difference that PCA is linear while the autoencoder can be nonlinear. The decoder layer(s) of AE reconstructs the input. A notable feature of the AE approach is that it can learn arbitrary relationships among different sensor variables in both linear and nonlinear cases and therefore be more flexible than existing linear monitoring methods [24]. However, like many neural networks, AE suffers from the overfitting problem [25]. Thus, a couple of improved AEs have been derived, e.g., multi-level-denoising AE [24], sparse autoencoder (SAE) [26]. Among them, the SAE is an effective one, which solves the overfitting problem by putting a sparsity constraint on the hidden units [23,26]. A successful application of SAE for anomaly detection in turbomachinery can be found in [27].
To address the aforementioned challenges, we developed a fault detection and isolation scheme based on the unsupervised machine learning method, SAE. Mahalanobis distance (MD) is employed to measure the distance between the condition monitoring data and the data under healthy condition. The MD merged multiple variables into one system-wide health indicator. Fault isolation can be achieved by using a two-dimensional contribution map. The variables with larger contributions are responsible for the fault. The performance of the fault detection model was evaluated by the receiver operating characteristics (ROC) curve and the area under the ROC curve (AUC value), considering both false alarm rate and fault detection rate.
The following of the paper is organized as follows. Section 2 explains the fault detection and isolation scheme and the algorithms employed in this paper. Then, the experimental data from an industrial factory are explained in Section 3. The performance of fault detection models on the multivariate industrial data are evaluated and discussed through two case studies in Section 4. Finally, conclusions are drawn in Section 5.

Fault Detection and Isolation Scheme
The proposed fault detection scheme is summarized in Figure 1, which consists of two phases: (1) offline training; and (2) online fault detection. The two phases and techniques applied in these two phases are explained in detail in the following sections.
In general, the fault detection scheme is based on the comparison and analysis of the reconstruction error between the actual monitoring data and its reconstructed output obtained from the trained normal reference model. Changes in the reconstruction error could indicate the occurrence of faults. The reconstruction errors of monitored variables are shown on a two-dimensional contribution map, which stacks multiple observations (time point) into one image to clearly illustrate the contribution of the variables over the entire faulty data times series. On the contribution map, a high reconstruction error will be identified as anomalies or faults.

Fault Detection and Isolation Scheme
The proposed fault detection scheme is summarized in Figure 1, which consists of two phases: (1) offline training; and (2) online fault detection. The two phases and techniques applied in these two phases are explained in detail in the following sections.  In general, the fault detection scheme is based on the comparison and analysis of the reconstruction error between the actual monitoring data and its reconstructed output obtained from the trained normal reference model. Changes in the reconstruction error could indicate the occurrence of faults. The reconstruction errors of monitored variables are shown on a twodimensional contribution map, which stacks multiple observations (time point) into one image to clearly illustrate the contribution of the variables over the entire faulty data times series. On the contribution map, a high reconstruction error will be identified as anomalies or faults.

Data Pre-Processing
As shown Figure 1, the monitoring data are firstly pre-processed and applies to both the offline training and online fault detection phases. This process filters out the erroneous monitoring data, which are common in practical measurement data caused by communication or instrumentation failures [28]. Note that the pre-processing rules can be different for different data and the rules for our experimental data are explained in Section 3.
Moreover, a data standardization process is applied to the monitoring data to scale the data to equal range. Assume that X = {̂; = 1, … , } (̂∈ ℝ for each measurement ) with different measurements, is the monitoring dataset, X = { ; = 1, … , } ( ∈ ℝ for each measurement ) is the dataset after standardization process. The transformation from X to X is given in Equation (1).
where is the mean, and is the standard deviation of the input ̂. Note that and were calculated form training data, and then applied to both training and monitoring data.

Sparse Autoencoder
The sparse autoencoders (SAE) is a special unsupervised feedforward neural network. A representative structure of SAE is presented in Figure 2. It can be observed that the SAE network includes two parts, i.e., encoder and decoder. The encoder connects the input layer to the hidden layer, with the weight matrix and the bias of this part being represented by (1) and (1) , respectively. The decoder connects the hidden layer to the output layer, with the corresponding weight matrix (2) and bias (2) .  As shown Figure 1, the monitoring data are firstly pre-processed and applies to both the offline training and online fault detection phases. This process filters out the erroneous monitoring data, which are common in practical measurement data caused by communication or instrumentation failures [28]. Note that the pre-processing rules can be different for different data and the rules for our experimental data are explained in Section 3.
Moreover, a data standardization process is applied to the monitoring data to scale the data to equal range. Assume thatX = {x i ; i = 1, . . . , D} (x i ∈ R n for each measurement i) with D different measurements, is the monitoring dataset, X = {x i ; i = 1, . . . , D} (x i ∈ R n for each measurement i) is the dataset after standardization process. The transformation fromX to X is given in Equation (1).
where µ i is the mean, and σ i is the standard deviation of the inputx i . Note that µ i and σ i were calculated form training data, and then applied to both training and monitoring data.

Sparse Autoencoder
The sparse autoencoders (SAE) is a special unsupervised feedforward neural network. A representative structure of SAE is presented in Figure 2. It can be observed that the SAE network includes two parts, i.e., encoder and decoder. The encoder connects the input layer to the hidden layer, with the weight matrix and the bias of this part being represented by W (1) and b (1) , respectively. The decoder connects the hidden layer to the output layer, with the corresponding weight matrix W (2) and bias b (2) .
The network tries to reconstruct the input vector in the output layer, as shown in Equation (2).
where X denotes the input vector X ∈ R D×1 , and x is the output vector. H W,b (X) is the nonlinear function of SAE, which predicts output X based on the input X, using parameters W and b. The SAE is trained by minimizing the cost function [26] in Equation (3). In this paper, the L-BFGS algorithm and back propagation [26,29] were employed to train the network for minimizing the cost function.
Appl. Sci. 2020, 10, 6789 5 of 19 The first part of Equation (3) is an average sum-of-squares error term, which is used for minimizing the error between the input data x and the output data x. x j represents the jth training input variable, D is the total amount of the input variables. The decoder function of the jth training variable is given by H W,b x j = f W (2) a + b (2) , where a is the activation of hidden layer a = f (W (1) x j + b (1) ), where W (1) , W (2) are weights, b (1) , b (2) are bias units, and f = (1 + exp(−x)) −1 , is the logistic sigmoid function.
The second part of the cost function is the regularization term, also called the weight decay term to avoid over-fitting, where λ 1 is the weight decay factor, L is the number of layers in the network, and, in Figure 2, L = 3. s l represents the number of units in layer l, excluding the bias unit. Obviously, W (l) ji is the weight, which is associated with the connection between the ith unit in layer l and the jth unit in layer l + 1. The network tries to reconstruct the input vector in the output layer, as shown in Equation (2).
where X denotes the input vector X ∈ × , and x is the output vector. , (X) is the nonlinear function of SAE, which predicts output X based on the input X, using parameters and .
The SAE is trained by minimizing the cost function [26] in Equation (3). In this paper, the L-BFGS algorithm and back propagation [26,29] were employed to train the network for minimizing the cost function.
The first part of Equation (3) is an average sum-of-squares error term, which is used for minimizing the error between the input data and the output data .
represents the training input variable, is the total amount of the input variables. The decoder function of the training variable is given by The second part of the cost function is the regularization term, also called the weight decay term to avoid over-fitting, where is the weight decay factor, is the number of layers in the network, and, in Figure 2 represents the number of units in layer , excluding the bias unit. Obviously, ( ) is the weight, which is associated with the connection between the unit in layer and the unit in layer + 1. The last part of the function is the sparse penalty term which imposes a sparsity constraint on the hidden units, where is the weight of this term.
is the number of neurons in the hidden layer. The last part of the function is the sparse penalty term which imposes a sparsity constraint on the hidden units, where β is the weight of this term. s 2 is the number of neurons in the hidden layer. KL(ρ ρ H ) measures how different two distributions are, and this term is given by Kullback-Leibler divergence function in Equation (4): where ρ is the sparsity parameter which is usually small and close to zero, H is the number of hidden unit in the SAE, ρ H is the average activation of hidden unit j given by represents the activation of hidden unit H in SAE.

Residual Evaluation and Threshold Calculation
With the data under a healthy condition, a health reference model can be trained. Then, we can calculate the multivariate residuals E between the input variables X and the reconstructed outputs X as follows: To detect the existence of a fault, an integrated monitoring indicator needs to be calculated based on the multivariate residuals. The Euclidean distance or the Mahalanobis distance (MD) are typically used. The MD is a unitless distance measurement, and takes into account the correlations among Appl. Sci. 2020, 10, 6789 6 of 19 variables. The was proposed by Indian statistician Mahalanobis to represent the covariance distance of the data. The MD provides a univariate distance value for a multivariate data and is therefore applied in the anomaly detection models. In the past, MD was successfully applied for the anomaly detection in wind turbine data [6,30,31].
In this paper, a robust MD [6,32] is employed, as shown in Equation (6).
Next, the fault detection threshold d is determined using the statistical properties of h. In other words, the fault detection threshold d can be obtained from the probability density function (PDF) of h for a given confidence level α by solving Equation (7): where p(h) is the PDF function of h. In this paper, the kernel density estimation (KDE) method is adopted for distribution fitting. The KDE method is a well-established approach in statistical distribution fitting and has been successfully applied to the field of processing monitoring and fault detection [6]. Therefore, according to the KDE method, p(h) can be written as: where N is the total number of h. K(·) is the kernel function and σ is the bandwidth. The selection of optimal value for σ is described in [33]. Here, the Gaussian kernel is used.

Online Fault Detection Phase
In the online fault detection phase, the monitoring data are pre-processed by the same method as that in the offline training method, then fed to the trained SAE model in offline mode.

Fault Detection
The health indicator is calculated as whereμ and MCD −1 are adopted directly from the training phase as calculated in Equation (5). E M is the residual between the actual measurement values Y and the reconstructed output Y obtained using the trained SAE fault detection model. A fault is detected when the real-time MD value h M exceeds the predefined threshold d.

Fault Isolation
After a fault being detected, it is necessary to identify the most relevant variables related to the fault. This can be achieved by a reconstruction-based contribution map and the variables with larger contributions are responsible for the fault.
The Q statistic (also called squared prediction error, SPE) is widely used in process control for condition monitoring data [34][35][36]. The traditional Q statistic contribution plot can be calculated by Equation (11) [37]: Appl. Sci. 2020, 10, 6789 7 of 19 wherex i is the reconstructed value of x i by the SAE model. The contribution plot is a one-dimensional plot, which only examines the contributions at one time point (one observation), and multiple contribution plots are needed to illustrate multiple observations in time series data. In contrast, a two-dimensional contribution map [38] stacks multiple observations into one image to clearly illustrate the contribution of the variables over the entire faulty data times series, which enables the fast identification of faulty variables within large heterogeneous data sets. Therefore, a two-dimensional Q statistic contribution map is applied.

Performance Metrics
To evaluate the performance of the model for fault detection, the receiver operating characteristics (ROC) curve and the resulting quantification metric area under the ROC curve (AUC) [6] are employed. The ROC curve is created by plotting the fault detection rate (FDR) on the Y-axis against false alarm rate (FAR) on the X-axis under different threshold levels. FDR is defined as the percentage of detected fault samples over the number of fault samples, and FAR is regarded as the percentage of detected fault samples over the number of normal samples [39]. Both the range of FDR and FAR fall in [0, 1].
An illustration of the ROC curve and AUC value are shown in Figure 3. On the ROC plot, the points situated at the top left corner have a higher FDR value and a lower FAR value, and hence are regarded having better performance than other points on the ROC curve. The ROC curve and the AUC metric provide a relative trade-off between FDR and FAR. A good fault detector should yield a high FDR and a low FAR on an ROC curve and, accordingly, a large AUC value.
The statistic (also called squared prediction error, SPE) is widely used in process control for condition monitoring data [34][35][36]. The traditional statistic contribution plot can be calculated by Equation (11) [37]: where is the reconstructed value of by the SAE model.
The contribution plot is a one-dimensional plot, which only examines the contributions at one time point (one observation), and multiple contribution plots are needed to illustrate multiple observations in time series data. In contrast, a two-dimensional contribution map [38] stacks multiple observations into one image to clearly illustrate the contribution of the variables over the entire faulty data times series, which enables the fast identification of faulty variables within large heterogeneous data sets. Therefore, a two-dimensional statistic contribution map is applied.

Performance Metrics
To evaluate the performance of the model for fault detection, the receiver operating characteristics (ROC) curve and the resulting quantification metric area under the ROC curve (AUC) [6] are employed. The ROC curve is created by plotting the fault detection rate (FDR) on the Y-axis against false alarm rate (FAR) on the X-axis under different threshold levels. FDR is defined as the percentage of detected fault samples over the number of fault samples, and FAR is regarded as the percentage of detected fault samples over the number of normal samples [39]. Both the range of FDR and FAR fall in [0, 1].
An illustration of the ROC curve and AUC value are shown in Figure 3. On the ROC plot, the points situated at the top left corner have a higher FDR value and a lower FAR value, and hence are regarded having better performance than other points on the ROC curve. The ROC curve and the AUC metric provide a relative trade-off between FDR and FAR. A good fault detector should yield a high FDR and a low FAR on an ROC curve and, accordingly, a large AUC value.

Experimental Data Description
High-pressure injection pumps are widely used in the oil and petrochemical industry for oil transportation, lift and injection. Centrifugal pumps are the most common pump type. They are typically operated under high rotating speed, high pressure, and high loading conditions and thus are likely subjected to performance degradations.

Experimental Data Description
High-pressure injection pumps are widely used in the oil and petrochemical industry for oil transportation, lift and injection. Centrifugal pumps are the most common pump type. They are typically operated under high rotating speed, high pressure, and high loading conditions and thus are likely subjected to performance degradations.
The pumps are used to move fluid through mechanical actions. To be specific, the pump's impeller rotates within the housing, reduces pressure at the inlet, and creates a vacuum. This motion then drives fluid to the outside of the pump's housing, which increases the pressure high enough to send it outside the discharge [40].
The working conditions of the pump is linked to its mechanical action and controlling methods. According to its mechanical action, several methods have been applied to control the pump to meet the end users requirements. The most common include throttling control, the rotational speed control, and using a sequence circuit [41]. These controlling methods adjust the flow, pressure or shaft speed of a pump to achieve expected performance. Consequently, in this paper, the flow, pressure and speed are used to evaluate the pump's working conditions. The training data are selected under the same working condition as the test data.
The data used in this paper were obtained from a multivariate condition monitoring system mounted on a high-pressure injection pump in an oil and petrochemical industry. The measurement variables are listed in Table 1. The data cover the period from 18 September 2012 to 08 September 2017, captured at a sampling rate of one sample per hour. In practice, the raw data from monitoring system cannot be directly used, as it often contains erroneous data or missing data due to communication system failure, sensors faults, maintenance or repair actions, etc. In order to filter out the errors in training data, the following rules are applied for data pre-processing: • Filter out all missing values; • Filter out downtime data, where speed is less than 10 rpm; • Filter out all data vectors where one or more parameters have a value higher/lower than a predefined threshold (viewed as erroneous data). This step aims to delete downtime data and erroneous data due to sensor fault. In this paper, the threshold values were decided based on the manufacturer specifications. For example, all measurements with a discharge pressure lower than 130 bar or greater than 250 bar were filtered out. After this process, there might still exist some outliers in the training data. The influence of such outliers was further eliminated by setting reference fault detection thresholds in training process of both SAE and the PCA models using Equation (7).

Case One: Detection of a Misalignment Fault
In order to demonstrate the effectiveness of the proposed approach, the traditional PCA method is also applied to process the data set. In this case, both models (PCA and SAE) for the pump anomaly detection were trained on the data acquired in a quarter of the year period from 10 March 2013 to 21 June 2013, where no abnormal events were recorded in the pump. The models were then used to assess the health condition after 22 June 2013. In this period, the pump worked under working conditions: speed around 90-105 rpm with suction pressure 10-12 bar, discharge pressure around 226-230 bar, and flow fluctuates between 850 to 1250 km 3 /d.
The MD thresholds for PCA and SAE are calculated based on the training data. The results are presented in Figure 4, in which the blue points under the threshold are healthy data, and the red points above the threshold are viewed as anomaly data. The reference thresholds (d PCA and d SAE ) for these two models were calculated based on Equation (7) with confidence level α = 0.01 [30]. For the PCA model, the number of principal components is determined when the cumulative variance contribution rates exceed 95%. For the SAE model, the number of nodes in the hidden layer was set as 11. Later in this case study, the influence of parameter selection is evaluated by ROC and AUC.
The MD thresholds for PCA and SAE are calculated based on the training data. The results are presented in Figure 4, in which the blue points under the threshold are healthy data, and the red points above the threshold are viewed as anomaly data. The reference thresholds ( and ) for these two models were calculated based on Equation (7) with confidence level = 0.01 [30]. For the PCA model, the number of principal components is determined when the cumulative variance contribution rates exceed 95%. For the SAE model, the number of nodes in the hidden layer was set as 11. Later in this case study, the influence of parameter selection is evaluated by ROC and AUC.  The anomaly detection results for the pump from 22 June 2013 to 6 July 2013 are shown in Figure  5. As can be seen, no anomalies were detected until 3 July for both SAE and PCA models, because no continuous anomalies were found during that period. It is in consistence with the fact that no failure was recorded in the historical maintenance log, and no obvious changes in original signals can be observed in bearing temperatures from 22 June to 3 July in Figure 6.
Moreover, it can be observed in Figure 5 that the SAE model generated continuous alarm after 4:00 pm on 3 July, and the PCA model generated continuous alarm after 6:00 am on 4 July. The maintenance log recorded a maintenance action on 7 July of misalignment fault in the pump system with four bearing temperatures changed altogether. The SAE anomaly detection models detected the fault four days before the maintenance actions being carried out. Contrastingly, in Figure 6, there was no obvious increasement in the bearings' measurements from 3 July to 4 July. After 4 July, four bearing temperatures increased altogether (see Figure 6), and both the SAE and PCA models showed an obvious increase in MD values. The anomaly detection results for the pump from 22 June 2013 to 6 July 2013 are shown in Figure 5. As can be seen, no anomalies were detected until 3 July for both SAE and PCA models, because no continuous anomalies were found during that period. It is in consistence with the fact that no failure was recorded in the historical maintenance log, and no obvious changes in original signals can be observed in bearing temperatures from 22 June to 3 July in Figure 6.
Moreover, it can be observed in Figure 5 that the SAE model generated continuous alarm after 4:00 pm on 3 July, and the PCA model generated continuous alarm after 6:00 am on 4 July. The maintenance log recorded a maintenance action on 7 July of misalignment fault in the pump system with four bearing temperatures changed altogether. The SAE anomaly detection models detected the fault four days before the maintenance actions being carried out. Contrastingly, in Figure 6, there was no obvious increasement in the bearings' measurements from 3 July to 4 July. After 4 July, four bearing temperatures increased altogether (see Figure 6), and both the SAE and PCA models showed an obvious increase in MD values.
The two-dimensional Q statistic contribution map of PCA from 22 July to 6 July is presented in Figure 7. A contribution map shows the contribution of each input variable to the anomaly that detected. By contrast, the contribution of each variable calculated by SAE from 22 July to 6 July is presented in Figure 8.

Normal stage
Small changes observed in bearings temperature. No failure record in maintenance log at this stage. The two-dimensional Q statistic contribution map of PCA from 22 July to 6 July is presented in Figure 7. A contribution map shows the contribution of each input variable to the anomaly that detected. By contrast, the contribution of each variable calculated by SAE from 22 July to 6 July is presented in Figure 8.

Normal stage
Small changes observed in bearings temperature. No failure record in maintenance log at this stage. The two-dimensional Q statistic contribution map of PCA from 22 July to 6 July is presented in Figure 7. A contribution map shows the contribution of each input variable to the anomaly that detected. By contrast, the contribution of each variable calculated by SAE from 22 July to 6 July is presented in Figure 8. In Figure 8, it clearly shows that it was the value changes in four bearings' measurements and discharge pressure that mainly contributed to the increase in the healthy indicator. The maintenance staff could check bearing related faults, including bearing faults, misalignment, rotor unbalance, etc. However, in Figure 7, the PCA model failed to show the abnormality in bearing 1 and 2 s temperatures.
To evaluate and compare the performance of PCA and SAE for fault detection, the ROC curves and AUC values are employed. The number of PCA components and the number nodes in hidden layers of SAE can influence the performance of the fault detection models; therefore, the ROC and AUC with different values of these parameters are calculated. The results are presented in Figure 9.
As explained in Section 2.3, a good fault detector should yield a high FDR and a low FAR in ROC plot and, accordingly, a high AUC value. As can be seen in Figure 9, these two models show similar performance for this case and the influence of the number of principle components and SAE nodes are not quite obvious.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 11 of 19 Contributuion map of input variables using PCA  In Figure 8, it clearly shows that it was the value changes in four bearings' measurements and discharge pressure that mainly contributed to the increase in the healthy indicator. The maintenance staff could check bearing related faults, including bearing faults, misalignment, rotor unbalance, etc. However, in Figure 7, the PCA model failed to show the abnormality in bearing 1 and 2s temperatures.
No fault being detected.
Fault being detected. Obvious changes in four bearing temperatures. To evaluate and compare the performance of PCA and SAE for fault detection, the ROC curves and AUC values are employed. The number of PCA components and the number nodes in hidden layers of SAE can influence the performance of the fault detection models; therefore, the ROC and  In Figure 8, it clearly shows that it was the value changes in four bearings' measurements and discharge pressure that mainly contributed to the increase in the healthy indicator. The maintenance staff could check bearing related faults, including bearing faults, misalignment, rotor unbalance, etc. However, in Figure 7, the PCA model failed to show the abnormality in bearing 1 and 2s temperatures.  To evaluate and compare the performance of PCA and SAE for fault detection, the ROC curves and AUC values are employed. The number of PCA components and the number nodes in hidden layers of SAE can influence the performance of the fault detection models; therefore, the ROC and AUC with different values of these parameters are calculated. The results are presented in Figure 9. As explained in Section 2.3, a good fault detector should yield a high FDR and a low FAR in ROC plot and, accordingly, a high AUC value. As can be seen in Figure 9, these two models show similar performance for this case and the influence of the number of principle components and SAE nodes are not quite obvious.

Case Two: Detection of a Misalignment Fault and Bearing Fault
From 1st September 2015 to 11 July 2016, the pump worked under another condition, with speed around 95-105 rpm, discharge pressure around 203-220 bar, and flow fluctuate around 1000-1400 km 3 /d. The training data were selected via setting thresholds for shaft speed, discharge pressure and flow, to make sure that the selected training data had the same working condition with monitoring data. Then, the trained models were applied to check the health condition after 12 July 2016.
The MD thresholds calculated during the training stage by PCA and SAE anomaly detection models, respectively, are presented in Figure 10a,b. As shown in the figure, the blue points under the threshold are regarded as healthy data, and the red points are viewed as abnormal data. Same as in case one, during the training stage, the reference thresholds ( and ) were calculated via Equation (7), with a confidence level = 0.01 [30]. For the PCA model, the number of principal components is determined when the cumulative variance contribution rates exceed 95%. For the SAE model, the number of nodes in the hidden layer was set as 11.

Case Two: Detection of a Misalignment Fault and Bearing Fault
From 1st September 2015 to 11 July 2016, the pump worked under another condition, with speed around 95-105 rpm, discharge pressure around 203-220 bar, and flow fluctuate around 1000-1400 km 3 /d. The training data were selected via setting thresholds for shaft speed, discharge pressure and flow, to make sure that the selected training data had the same working condition with monitoring data. Then, the trained models were applied to check the health condition after 12 July 2016.
The MD thresholds calculated during the training stage by PCA and SAE anomaly detection models, respectively, are presented in Figure 10a,b. As shown in the figure, the blue points under the threshold are regarded as healthy data, and the red points are viewed as abnormal data. Same as in case one, during the training stage, the reference thresholds (d PCA and d SAE ) were calculated via Equation (7), with a confidence level α = 0.01 [30]. For the PCA model, the number of principal components is determined when the cumulative variance contribution rates exceed 95%. For the SAE model, the number of nodes in the hidden layer was set as 11.
The fault detection results from 12 July 2016 to 23 August 2016 are shown in Figure 11, and, as a comparison, the measurements of bearings are shown in Figure 12. As can be seen in Figure 12, from the 12 July to 17 July, there was no obvious changes in original data, and, after 17 July, four bearings' temperature increased, resulting from a misalignment fault. In Figure 11a,b, both the PCA and SAE models indicated the pump worked healthily from the 12 July to 17 July. The PCA generated continuous alarms after 22:00 on 18 July. By contrast, the SAE discovered some anomalous and generated continuous alarms after 16:00 on 17 July. The fault detection results from 12 July 2016 to 23 August 2016 are shown in Figure 11, and, as a comparison, the measurements of bearings are shown in Figure 12. As can be seen in Figure 12, from the 12 July to 17 July, there was no obvious changes in original data, and, after 17 July, four bearings' temperature increased, resulting from a misalignment fault. In Figure 11a,b, both the PCA and SAE models indicated the pump worked healthily from the 12 July to 17 July. The PCA generated continuous alarms after 22:00 on 18 July. By contrast, the SAE discovered some anomalous and generated continuous alarms after 16:00 on 17 July.
Contribution maps for PCA and SAE models are presented in Figure 13 and Figure 14, respectively. In Figure 13, the anomalies in the PCA model are mainly contributed by bearing 1 and 2, and the contribution map failed to indicate changes in thrust bearings' temperatures. In Figure 14, it shows that the anomalies in the SAE model were caused by the temperature changes in four bearings, which was inconsistent with the observations in Figure 12.
From 20 July to 23 August 2016, it can be seen in Figure 11a,b that continuous anomalous points were detected with both PCA and SAE models. Both Figures 13 and 14 clearly show that it was bearing 2′s vibration amplitude that mainly contributed to the fault. This consisted of the performance showed in Figure 12b that the bearing 2′s vibration amplitude kept increasing in this period.   Contribution maps for PCA and SAE models are presented in Figures 13 and 14, respectively. In Figure 13, the anomalies in the PCA model are mainly contributed by bearing 1 and 2, and the contribution map failed to indicate changes in thrust bearings' temperatures. In Figure 14, it shows that the anomalies in the SAE model were caused by the temperature changes in four bearings, which was inconsistent with the observations in Figure 12.
From 20 July to 23 August 2016, it can be seen in Figure 11a,b that continuous anomalous points were detected with both PCA and SAE models. Both Figures 13 and 14 clearly show that it was bearing 2 s vibration amplitude that mainly contributed to the fault. This consisted of the performance showed in Figure 12b that the bearing 2 s vibration amplitude kept increasing in this period.
To evaluate the performance of PCA and SAE for fault detection, the ROC curves and AUC values are calculated and shown in Figures 15 and 16. The models' performance, influenced by the number of PCA components and the number nodes in hidden layers of SAE, was explored. Figure 15 is the ROC curves and average AUC values for the detection of the misalignment, and Figure 16 is for the detection of the bearing fault. As can be seen in both figures, the SAE model had higher AUC value than the PCA model, especially when detecting the misalignment fault.   To evaluate the performance of PCA and SAE for fault detection, the ROC curves and AUC values are calculated and shown in Figures 15 and 16. The models' performance, influenced by the number of PCA components and the number nodes in hidden layers of SAE, was explored. Figure 15 is the ROC curves and average AUC values for the detection of the misalignment, and Figure 16 is for the detection of the bearing fault. As can be seen in both figures, the SAE model had higher AUC value than the PCA model, especially when detecting the misalignment fault.    To evaluate the performance of PCA and SAE for fault detection, the ROC curves and AUC values are calculated and shown in Figures 15 and 16. The models' performance, influenced by the number of PCA components and the number nodes in hidden layers of SAE, was explored. Figure 15 is the ROC curves and average AUC values for the detection of the misalignment, and Figure 16 is for the detection of the bearing fault. As can be seen in both figures, the SAE model had higher AUC value than the PCA model, especially when detecting the misalignment fault.

Discussion
In summary, in these two case studies, the SAE model was able to detect the fault before the measured signals, showing obvious changes. The early detection of faults in pumps enables the operators to schedule maintenance at optimum time and thereafter increase the safety and reliability of the system, reduce energy consumption, service and maintenance costs. When compared with the PCA model, the SAE model can give a relatively lower FAR (around 0.1), higher FDR (over 0.9) and higher AUC value (over 0.98) in both cases. Especially when detecting the misalignment fault in case 2, the FAR for the PCA model is over 0.22 when FDR is above 0.8. In contrast, the SAE model kept its good performance with high FDR (over 0.9) and low FAR (lower than 0.1). The results indicate that a proposed fault detection scheme based on SAE is a good alternative for some conventional fault detection models in industrial fault detection systems, which are insensitive to incipient fault or have higher false alarm rates. Furthermore, the SAE performed better in isolating the relevant variables that are responsible for the fault. It can help to correctly isolate the abnormal variables of the detected fault, assisting operators and data analysts to trace back the root cause, making it easier for maintenance planning.

Conclusions
In this paper, a fault detection and isolation scheme based on SAE was developed and applied in an industrial multivariate monitoring data set. Robust Mahalanobis distance was applied to compose multiple variables into one system-wide health indicator. The two-dimensional contribution map was calculated to help isolate variables that are responsible for the fault. In summary, the contribution of this paper includes: (1) The application of a state-of-the-art unsupervised fault detection model, SAE, on industrial multivariate process data, with each step, data pre-processing, parameters setting, and model evaluation explained in detail. Our experience with processing the industrial data set can benefit relevant readers. (2) The proposed fault detection scheme based on SAE and Mahalanobis distance has been shown to be more effective in detecting faults than the traditional PCA method, with a lower false alarm rate, higher fault detection rate and higher AUC values, and therefore offers a good alternative solution for complicated industrial process systems that experience high false alarm rates. (3) The proposed fault detection and isolation scheme can detect faults at its incipient stage and have accurate performance in fault isolation, which can assist operators and data analysts to infer the fault cause, making it easier for maintenance planning. This scheme can also be applied in the fault detection and isolation of other process machines, such as compressors and turbines.