A Method for Aero-Engine Gas Path Anomaly Detection Based on Markov Transition Field and Multi-LSTM

: There are some problems such as uncertain thresholds, high dimension of monitoring parameters and unclear parameter relationships in the anomaly detection of aero-engine gas path. These problems make it difﬁcult for the high accuracy of anomaly detection. In order to improve the accuracy of aero-engine gas path anomaly detection, a method based on Markov Transition Field and LSTM is proposed in this paper. The correlation among high-dimensional QAR data is obtained based on Markov Transition Field and hierarchical clustering. According to the correlation analysis of high-dimensional QAR data, a multi-input and multi-output LSTM network is constructed to realize one-step rolling prediction. A Gaussian mixture model of the residuals between predicted value and true value is constructed. The three-sigma rule is applied to detect outliers based on the Gaussian mixture model of the residuals. The experimental results show that the proposed method has high accuracy for aero-engine gas path anomaly detection.


Introduction
Aero-engine is the heart of an aircraft, and is also the system with high failure rate and complex maintenance. Working condition of the aero-engine directly affects the reliability of aircraft, and even the safety of passengers [1]. According to incomplete statistics, more than 90% of aero-engine failures are related to gas path components, and the corresponding cost of these failures accounts for 60% of the total maintenance cost. Gas path anomaly detection is the top priority in aero-engine anomaly detection research [2]. Aero-engine gas path anomalies usually include intermittent gas path anomalies and persistent gas path anomalies. Intermittent gas path anomalies are often difficult to be detected because of their short duration, and the detection of such anomalies currently relies mainly on the oral reports of pilots. The Quick Access Recorder (QAR) data records the complete flight process of aircraft [3], and its sampling frequency is up to 1 Hz, which can be applied to detect anomalies of aero-engine gas path.
In recent years, there are more and more researches on anomaly detection of time series. For different manifestations of anomalies, there are four categories, including point anomaly, pattern anomaly, sequence anomaly, and subsequence anomaly. For different detection principles, there are five categories, including distribution-based methods, depthbased methods, clustering-based methods, distance-based methods, and density-based methods. Hao Sun [4] proposed a weakly supervised method based on mapping relationship mining and improved density peak clustering for gas-path anomaly detection of civil aero-engines. Chen, Jiusheng [5] proposed an adaptive weighted one-class SVM-based fault detection method coupled with incremental and decremental strategy. Pérez-Ruiz [6] proposed a hybrid fault-recognition technique based on regularized extreme learning machines and sparse representation classification for both fault detection and fault identification. Zaccaria, Valentina [7] proposed a probabilistic Bayesian network simulated by an adaptive performance model for fault detection and identification for engines. Gharoun, Hassan [8] applied the data-driven methods to analyze the relationship between engine exhaust gas temperature (EGT) and other parameters of the engine, and proposed one-class support vector machine to detect faults in each flight. XIE Ji-wei [9] proposed a method based on Mahalanobis distance for aero-engines health monitoring. Wen Ying [10] proposed an anomaly monitoring method based on self-adaptive kernel principal component analysis. The above methods have carried out research on the aero-engine anomaly detection and have made certain innovations and breakthroughs. However, the detection accuracy of the above methods needs a large number of samples for training. Furthermore, the above methods do not consider the correlation between different monitoring data, and the applicability of the anomaly detection is not strong. In this paper, different time series are transformed into Markov transfer matrix, and a hierarchical cluster is applied to obtain the correlation of time series. Compared with other time series correlation analysis methods, this method reduces the impact of information loss. Multi-LSTM is applied for anomaly detection of time series. As a special cyclic neural network, LSTM not only has the advantages of RNN dynamic memory, but also avoids the gradient disappearance of RNN and the lack of long and short memory ability. LSTM model has advantages of processing the nonlinear aero-engine gas path monitoring data, automatically selecting the optimal time interval and memorizing long-time historical data. Compared with other anomaly detection methods based on LSTM, the construction of multi-LSTM improves the accuracy of prediction and improves the accuracy of anomaly detection furtherly.
The rest of this paper is organized as follows. Aero-engine gas path and monitoring data are introduced briefly in Section 2. In Section 3, the correlation between gas path monitoring data is analyzed based on Markov Transition Field and hierarchical clustering. Section 4 constructs the anomaly detection model based on LSTM and Gaussian anomaly detection model. Section 5 discusses the experiment results and evaluates the accuracy of proposed anomaly detection model. Finally, Section 6 summarizes the conclusions.

Introduction of Aero-Engine Gas Path and Monitoring Data
The research object of this paper is the Airbus A330, Rolls-Royce Trent700 aero-engine gas path system. The aero-engine and its gas path structure are shown in Figure 1 [11,12]. The aero-engine gas path is composed of Bleed Monitoring Computer (BMC), Temperature Control Thermostat (ThC), Solenoid Thermostat (Ths), Regulated Pressure Transducer (Pr), Transferred Pressure Transducer (Pt), PreCooler Exchanger (PCE), OverPressure Valve (OPV), Fan Air Valve (FAV), Intermediate Pressure Valve (IPCV), High Pressure Valve (HPV), and so on.
The aero-engine flight phase includes take-off, climb, cruise, and landing. The QAR data of each phase has different characteristics. During the aero-engine flight phase, the QAR data of the gas path system includes temperature of precooler 1 (TMP1), temperature of precooler 2 (TMP2), speed of No.1 engine N1 (N11), speed of No.2 engine N1 (N12), speed of No.2 engine N1 (N21), speed of No.2 engine N2 (N22), standard atmospheric pressure height (ALT_STD), pressure of precooler 1 (PRS1), pressure of precooler 2 (PRS2), and so on. Some QAR monitoring data of aero-engine gas path is shown in Table 1. The aero-engine flight phase includes take-off, climb, cruise, and landing. The QAR data of each phase has different characteristics. During the aero-engine flight phase, the QAR data of the gas path system includes temperature of precooler 1 (TMP1), temperature of precooler 2 (TMP2), speed of No.1 engine N1 (N11), speed of No.2 engine N1 (N12), speed of No.2 engine N1 (N21), speed of No.2 engine N2 (N22), standard atmospheric pressure height (ALT_STD), pressure of precooler 1 (PRS1), pressure of precooler 2 (PRS2), and so on. Some QAR monitoring data of aero-engine gas path is shown in Table  1. The gas path monitoring data of the whole flight phase includes temperature of precooler, speed of engine N1, speed of engine N2, and pressure of precooler. The gas path monitoring data is shown in Figure 2.  The gas path monitoring data of the whole flight phase includes temperature of precooler, speed of engine N1, speed of engine N2, and pressure of precooler. The gas path monitoring data is shown in Figure 2.
Since the QAR data of different flight phases have different characteristics, this paper mainly conducts anomaly detection on the QAR monitoring data of aero-engine gas path in the cruise stage. The QAR data of cruise stage is shown in Figure 3. Since the QAR data of different flight phases have different characteristics, this paper mainly conducts anomaly detection on the QAR monitoring data of aero-engine gas path in the cruise stage. The QAR data of cruise stage is shown in Figure 3. Since the QAR data of different flight phases have different characteristics, this paper mainly conducts anomaly detection on the QAR monitoring data of aero-engine gas path in the cruise stage. The QAR data of cruise stage is shown in Figure 3.

Correlation Analysis of Gas Path Monitoring Data Based on Markov Transition Field and Hierarchical Clustering
In this section, the correlation of different time series is obtained by calculating the similarity of Markov Transition Probability Matrix. Based on the similarity of time series, hierarchical cluster analysis is applied to obtain the correlation of different time series.
Based on the time sequence of aero-engine QAR data, a Markov Transition Matrix is formed by calculating the Markov Transition Probability of the time series to represent the correlation of time series [13][14][15].
IFor a given time series X, the range of the time series is divided into Q parts, then any X i in the time series will be mapped to the corresponding q i . At this time, in the order of time axis, the transfer probability between data points can be calculated as first-order Markov chain to construct a weighted adjacency matrix W of Q × Q, where w i,j represents the frequency of quantile q j transferring to quantile q i . After standardization, the generated matrix W is the Markov Transition Matrix. The matrix is not sensitive to the distribution of X and the step size of time series t i . In order to reduce the impact of information loss, the Markov Transition Field is transformed on the Markov Transition Matrix.
Depending on the length of the original data, the data is converted into Q quantiles, and a Q × Q Markov Transition Matrix (MTF) is generated, which contains the step size (i.e., the time axis) i and j in the time series and the transition direction q j . In MTF, M ij is the probability of q i transferring to q j . MTF increases the time step and position relationship based on Markov Transition Matrix.
According to the different step-size probabilities of the selected time series point i to point j, it is represented by different pixels to generate M ij . MTF encodes the transfer probability of the time series. M ij represents the transition probability between points with time interval k. When k = 0, the matrix can be taken to the main diagonal M ij , which represents the probability of each quantile i shifting to itself. In order to reduce the size of the generated image, facilitate the preservation and improve the computational efficiency, the transition probability in the subsequence of length m can be added together. The conversion process of Markov change field is shown in Figure 4.

Correlation Analysis of Gas Path Monitoring Data Based on Markov Transition Field and Hierarchical Clustering
In this section, the correlation of different time series is obtained by calculating the similarity of Markov Transition Probability Matrix. Based on the similarity of time series, hierarchical cluster analysis is applied to obtain the correlation of different time series.
Based on the time sequence of aero-engine QAR data, a Markov Transition Matrix is formed by calculating the Markov Transition Probability of the time series to represent the correlation of time series [13][14][15].
For a given time series X , the range of the time series is divided into Q parts, then any i X in the time series will be mapped to the corresponding i q . At this time, in the order of time axis, the transfer probability between data points can be calculated as firstorder Markov chain to construct a weighted adjacency matrix W of Q Q × , where , i j w represents the frequency of quantile j q transferring to quantile i q . After standardization, the generated matrix W is the Markov Transition Matrix. The matrix is not sensitive to the distribution of X and the step size of time series i t . In order to reduce the impact of information loss, the Markov Transition Field is transformed on the Markov Transition Matrix.
Depending on the length of the original data, the data is converted into Q quantiles, and a Q Q represents the probability of each quantile i shifting to itself. In order to reduce the size of the generated image, facilitate the preservation and improve the computational efficiency, the transition probability in the subsequence of length m can be added together. The conversion process of Markov change field is shown in Figure 4.  The analysis results of QAR data with Markov Transition Field are as shown in Figure 5.
Based on the Markov Transition Probability Matrix of aero-engine QAR data, the similarity of the matrix is calculated, and the clustering analysis is carried out to obtain the correlation between different variables.
Cohesive hierarchical clustering algorithm is a bottom-up clustering algorithm [16,17]. The algorithm flow is shown as Algorithm 1. It takes each sample in the dataset as an initial class, then merges the two classes with the smallest distance between classes until the maximum distance within each class is less than the set threshold, or the number of classes reaches the set value.

Algorithm 1. Cohesive Hierarchical Clustering Algorithm
Input: candidate sample set, maximum distance threshold ε. Output: feature clustering result set C.
For Si ∈ D // Use each sample in D as an initial cluster. Ci ← empty set; Ci ← Ci∪Si; C ← C∪Ci; End for While Maximum sample distance less than e after merging two classes.
Calculate the distance between the two classes, merge the smallest distance classes, and select one of them to merge if there are many pairs of smallest distance classes with the same distance. Based on the Markov Transition Probability Matrix of aero-engine QAR data, the similarity of the matrix is calculated, and the clustering analysis is carried out to obtain the correlation between different variables.
Cohesive hierarchical clustering algorithm is a bottom-up clustering algorithm [16,17]. The algorithm flow is shown as Algorithm 1. It takes each sample in the dataset  The hierarchical clustering results are as shown as Figure 6.
End for While Maximum sample distance less than e after merging two classes.
Calculate the distance between the two classes, merge the smallest distance classes, and select one of them to merge if there are many pairs of smallest distance classes with the same distance.

End while
The hierarchical clustering results are as shown as Figure 6. According to the clustering results, there is a strong correlation between TMP, PRS, N1, and N2 for aero-engine gas path.

Prediction Based on Multi-LSTM
The inputs of LSTM are selected based on the correlation analysis of time series to construct multi-LSTM network. The LSTM prediction principle is shown in Figure 7 [18][19][20]. According to the clustering results, there is a strong correlation between TMP, PRS, N1, and N2 for aero-engine gas path.

Prediction Based on Multi-LSTM
The inputs of LSTM are selected based on the correlation analysis of time series to construct multi-LSTM network. The LSTM prediction principle is shown in Figure 7 [18][19][20].  In multi-LSTM model, the key parameters and formulas are shown as follows.
( ) In formula, W and b are the weight matrix and bias of forgetting gate. Formula (6) is In multi-LSTM model, the key parameters and formulas are shown as follows.
In formula, W and b are the weight matrix and bias of forgetting gate. Formula (6) is the output of the whole LSTM. The number of neurons in the input layer of LSTM is the number of input sequences, and the number of output neurons is the number of output sequences. When predicting the QAR data of aero-engine, one or more future time series of the aero-engine can be predicted by inputting the current or historical data in a period of time. Multi-LSTM prediction flow is shown in Figure 8. In multi-LSTM model, the key parameters and formulas are shown as follows.
In formula, W and b are the weight matrix and bias of forgetting gate. Formula (6) is the output of the whole LSTM. The number of neurons in the input layer of LSTM is the number of input sequences, and the number of output neurons is the number of output sequences. When predicting the QAR data of aero-engine, one or more future time series of the aero-engine can be predicted by inputting the current or historical data in a period of time. Multi-LSTM prediction flow is shown in Figure 8.

Gaussian Distribution Model Based on Prediction Error
Gaussian distribution method is a common anomaly detection method [21][22][23]. Its basic assumption is that the data set obeys a Gaussian distribution. The probability density of normal data in the Gaussian distribution is high, and the probability density of abnormal data in the Gaussian distribution is low. This method has a wide range of applications and performs well on non-Gaussian data sets. In addition, the Gaussian distribution method has the advantages of good anomaly detection effect, easy programming, and fast calculation speed, which can obtain better results for anomaly detection of gas path.
For a data set, the maximum likelihood estimation method can be used to estimate the mean and variance of each parameter. The specific calculation is shown as follows.
In formula, µ j represents the mean value of the j parameter. σ 2 j represents the variance of parameter j. x j i represents the j parameter of the i sample. After the mean and variance of each parameter are estimated, the probability density of each parameter in its parameter distribution can be calculated. Assuming that each parameter is independent of each other, the probability density of each sample in the Gaussian distribution is the product of the probability density of all parameters. The probability density calculation in the dataset distribution is shown below.
where p(x i ) represents the probability density of x i in the Gaussian distribution of the dataset. Those samples with low probability density are likely to be abnormal. Assuming that there is a critical value, normal and abnormal samples can be distinguished by this critical value. The division of normal and abnormal samples is as follows.
In the formula, y i represents the anomaly detection results of the i sample; ε represents threshold. The appropriate critical value is the key to ensure that the Gaussian model can accurately identify abnormal samples. The selection of the critical value is determined by the three-sigma rule [24]. The anomaly detection process based on LSTM and Gaussian model is shown in Figure 9.

Results and Discussions
Data in the experiment is derived from the real QAR data of civil aviation engines. The experimental environment configuration is shown in Table 2. The parameter settings of multiple-input and multiple-output LSTM network are shown in Table 3. Table 3. Parameters of multi-LTSM prediction model.

Model Parameter
Actual Value Input layer parameters 50 × 8 Hidden layer 1 Neuronal number 100 Prediction step 1 Figure 9. Anomaly Detection Process Based on LSTM and Gaussian Distribution.

Results and Discussion
Data in the experiment is derived from the real QAR data of civil aviation engines. The experimental environment configuration is shown in Table 2. The parameter settings of multiple-input and multiple-output LSTM network are shown in Table 3. The Root Mean Square Error (RMSE) is applied to estimate prediction accuracy.
where f i and y i represent the predicted value and true value. The prediction errors of different QAR data are shown in Table 4. After training, the multi-LSTM model is applied for prediction. The residuals between the predicted value and the actual value are calculated. The residual probability distribution functions based on the Gaussian mixture model is shown in Figure 10.
The three-sigma rule is applied for anomaly detection. The anomaly detection accuracy is calculated as follows.
TP (True Positives) indicates that the prediction result of an actual positive case is a positive case. FP (False Positives) indicates that the prediction result of an actual negative case is a positive case. FN (False Negatives) indicates that the prediction result of an actual positive case is a negative case. TN (True Negatives) indicates that the prediction result of an actual negative case is a negative case.
The accuracy of anomaly detection is shown in Table 5. After training, the multi-LSTM model is applied for prediction. The residuals between the predicted value and the actual value are calculated. The residual probability distribution functions based on the Gaussian mixture model is shown in Figure 10. The anomaly detection results are shown in Figures 11-18. The green curve represents the observed data, the yellow curve represents the curve predicted by multi-LSTM, and the red curve represents the abnormal value detected by the model. The anomaly detection results are shown in Figures 11-18. The green curve represents the observed data, the yellow curve represents the curve predicted by multi-LSTM, and the red curve represents the abnormal value detected by the model.    The anomaly detection results are shown in Figures 11-18. The green curve represents the observed data, the yellow curve represents the curve predicted by multi-LSTM, and the red curve represents the abnormal value detected by the model.                According to the anomaly detection results as shown in Table 5 and Figures 11-18, for anomaly detection of aero-engine gas path, the accuracy of proposed method is over 98%. The experiment results show the superiority of the method proposed in this paper, and it can be applied to solve similar time series anomaly detection problems.

Conclusions
In this paper, the correlation among QAR data is analyzed by Markov transition matrix and hierarchical clustering method. Based on correlation analysis, the key monitoring data is selected to construct the multi-LSTM network prediction. The residuals between the predicted value and the real value are calculated, and the probability distribution of  According to the anomaly detection results as shown in Table 5 and Figures 11-18, for anomaly detection of aero-engine gas path, the accuracy of proposed method is over 98%. The experiment results show the superiority of the method proposed in this paper, and it can be applied to solve similar time series anomaly detection problems.

Conclusions
In this paper, the correlation among QAR data is analyzed by Markov transition matrix and hierarchical clustering method. Based on correlation analysis, the key monitoring data is selected to construct the multi-LSTM network prediction. The residuals between According to the anomaly detection results as shown in Table 5 and Figures 11-18, for anomaly detection of aero-engine gas path, the accuracy of proposed method is over 98%. The experiment results show the superiority of the method proposed in this paper, and it can be applied to solve similar time series anomaly detection problems.

Conclusions
In this paper, the correlation among QAR data is analyzed by Markov transition matrix and hierarchical clustering method. Based on correlation analysis, the key monitoring data is selected to construct the multi-LSTM network prediction. The residuals between the predicted value and the real value are calculated, and the probability distribution of residuals is constructed based on the Gaussian mixture model. Based on the residual probability distribution, the three-sigma rule is applied for anomaly detection. Experiments show that the method proposed in this paper can detect the anomaly of aero-engine gas path accurately, and it is suitable for solving similar anomaly detection problems.
Author Contributions: L.C. proposed and designed the anomaly detection methods; L.C. and C.Z. contributed to the manuscript discussion and writing; Q.Z. and Y.S. gave comments on the manuscript and checked the writing; J.W. and Y.J. contributed to some results of the study. C.L. and Y.W. contributed QAR data for results and analysis. All authors have read and agreed to the published version of the manuscript.