A Prognostic and Health Management Framework for Aero-Engines Based on a Dynamic Probability Model and LSTM Network

: In this study, a prognostics and health management (PHM) framework is proposed for aero-engines, which combines a dynamic probability (DP) model and a long short-term memory neural network (LSTM). A DP model based on Gaussian mixture model-adaptive density peaks clustering algorithm, which has the advantages of an extremely short training time and high enough precision, is employed for modelling engine fault development from the beginning of engine service, and principal component analysis is introduced to convert complex high-dimensional raw data into low-dimensional data. The model can be updated from time to time according to the accumulation of engine data to capture the occurrence and evolution process of engine faults. In order to address the problems with the commonly used data driven methods, the DP + LSTM model is employed to estimate the remaining useful life (RUL) of the engine. Finally, the proposed PHM framework is validated experimentally using NASA’s commercial modular aero-propulsion system simulation dataset, and the results indicate that the DP model has higher stability than the classical artiﬁcial neural network method in fault diagnosis, whereas the DP + LSTM model has higher accuracy in RUL estimation than other classical deep learning methods.


Introduction
Aero-engines are core machinery systems with complex structures, high levels of integration and poor working conditions, of which the reliable and efficient operations are crucial to the flight safety of aircraft. Prognostics and health management (PHM) is an effective maintenance technique to achieve safe and reliable operations of machines and systems, and plays a significant role in the operations of aero-engines [1][2][3]. Incomplete statistics showed that failures of gas path components account for more than 90% of all engine failures, 60% of the aero-engine maintenance costs are spent on gas-path components [4]. However, due to the unique manufacturing technologies and special materials of the aero-engine, it is difficult to maintain and replace the components of engines frequently [5]. The PHM system is capable of determining whether a gas-path component has failed and deciding whether it needs to be repaired or replaced, thus can reduce routine maintenance costs and time. Fault diagnosis and remaining useful life (RUL) estimation are research emphases of PHM.
In general, PHM approaches can be categorized into model-based methods [6,7] and data-driven methods [8,9]. Model-based methods include physical models, structural analysis, contact analysis cumulative damage models, cyclic fatigue, and crack propagation models, etc., [10]. Obviously, they need a detailed mathematical model of the aeroengine [11]. In addition, their reliability decreases as the system nonlinearities, complexity, framework, fault diagnosis and RUL estimation are not deeply linked, and health indicators are not fully utilized in RUL estimation, which makes it necessary to mine information from engine sensor data again before RUL estimation, which is not the most efficient. Li et al. [28] proposed a framework for deriving system requirements for PHM system development to provide a solution for predicting RUL. Similarly, the framework does not consider a technical route that combines fault diagnosis with RUL estimation.
Given the above, an aero-engine PHM framework based on GMM-ADPC algorithm and LSTM network is proposed in this study. In this study, a new GMM-ADPC algorithm is proposed to construct probability distribution space of engine data. Based on the GMM-ADPC algorithm, a dynamic probability (DP) model is proposed for modeling engine fault development. This model has a solid mathematical foundation and can make full use of engine life cycle data. And principal component analysis (PCA) is used to convert complex high-dimensional raw data into low-dimensional data. For the purpose of addressing the problems with the commonly used data-driven methods, the DP + LSTM model is introduced for RUL estimation. Here, the engine fault probability distribution data constructed by the DP model is used as the input of the LSTM network, which realizes the information transmission between the two modules, avoids sensor noise interference to a certain extent, and improves the stability and accuracy of the PHM framework.
The rest of this paper proceeds as follows. Section 2 introduces the DP model and LSTM algorithms. Section 3 details the architecture and the realization of the framework. Section 4 provides the validation results of the framework in NASA's dataset. Finally, the conclusion of this work is given in Section 5.

Probability Modeling
The core algorithm of the probability model is the GMM-ADPC algorithm, and the probability difference measuring method is used to quantify the difference between two probability models so as to generate fault detection indexes.

GMM-ADPC Algorithm
GMM is an extension of single Gaussian probability density function. It is a weighted sum of a finite number of GCs. Assume X = [x 1 , x 2 , · · ·, x i , · · ·, x N ] denote a feature sample set composed by N FMFs, i = 1, 2, . . . , N, where x i = [x 1 , x 2 , · · ·, x D ] represents a FMF with D dimensionality. Equation (1) expresses the probability density function of GMM and Equation (2) expresses the GC.
where ζ = {(w 1 , µ 1 , Σ 1 ), · · ·, (w k , µ k , Σ k ), · · ·, (w K , µ K , Σ K )} is the most important parameter set of GMM. The number of GCs is K and k = 1, 2, · · ·, K. The parameter ζ, w k , µ k and Σ k denote the mixture weight, mean, and covariance matrix of the k-th GC, respectively, | · | is the determinant value, and T is the transpose. Usually, the Expectation-Maximization (EM) algorithm is used to construct GMM [29]. However, the drawback of the EM algorithm is that the initial values of ζ will greatly affect the result, which results in reduced stability of GMM. Some methods, such as the Bayesian non-parametric clustering approach and enhanced dynamic GMM method, have been proposed to determine ζ [21,30]. However, the ideal approach is adaptive and not computationally intensive. In addition, since each sample set belongs to a different FMF, the selected method is required to have good generalization performance. In fact, GMM represents the probability distribution of FMFs, with each GC representing a cluster. ADPC is an improved clustering algorithm based on probability density distribution [31,32]. The Aerospace 2022, 9, 316 4 of 21 main advantage of the ADPC algorithm is that it could effectively identify clustering centers and cut-off distances with low-dimensions or arbitrary data sets. The ADPC algorithm contains the following two main steps: Step 1: Automatic identification of the cut-off distance. First, define a variable H to represent the uncertainty of the system expressed as Equations (3)- (5). If the values of H are smaller, the uncertainty of the system will be smaller, which is in favor of clustering.
where d ij is the distance between FMF x i and FMF x j , ρ i is the local density of FMF x i . Make the d c gradually increase from 0 until H has the minimum value, in which case d c is the most appropriate cut-off distance. For some samples, it is difficult to find the cut-off distance that meets the above requirements. In this case, d c can be set as the top 1% to 2% of the distance between all data points [31].
Step 2: Automatically identify clustering centers. Clustering centers should have both large ρ i and δ i values. Define a variable γ expressed as Equation (6). Sample points with larger γ values are more suitable for clustering centers.
In addition, the number of cluster centers needs to be determined. Firstly, calculate the γ value of each FMF and sort them. Let tend i be a criterion for determining the number of cluster centers, and tend i expressed as Equation (7).
Then, select the n FMFs with the largest γ value, and calculate the tend i value for each FMF. If the tend i value of the m-th FMF is the largest, then the former m − 1 FMFs are taken as the clustering center.
After the initial clustering is completed, the mean, covariance, and weight of FMFs belonging to each cluster can be obtained and can be used as the initial value ζ of the EM algorithm.

Probability Difference Measuring Method
In this paper, two probability models are constructed, as detailed in the following sections. Appropriate rules for quantifying the difference between the two models need to be determined. Some methods such as Renyi divergence and Kullback-Leibler divergence [33] have been proposed to measure the difference. However, these methods are not symmetric and normalized. Qiu et al. [21] used the Monte Carlo simulation method in probability similarity measuring and achieved good results [34]. Firstly, let X MC = {x 1 , x 2 , · · ·, x R } denote a large number of random samples that are generated by Monte Carlo sampling. Secondly, the posterior probability of X MC , is denoted as P(X MC |ζ ) = {Φ(x 1 |ζ ), Φ(x 2 |ζ ), · · ·, Φ(x R |ζ )} T , which can be calculated by Equations (1) and (2). Finally, the difference between the two probability models can be calculated by Equation (8). In this paper, Di f f (ζ 1 , ζ 2 ) actually denotes the fault detection indexes.

Long Short-Term Memory Networks
LSTM model based recurrent neural network (RNN) can adaptively learn the representative information through multiple non-linear transformations [35][36][37]. Compared with the traditional ANN, LSTM can remember all the historical information entered and is suitable for dealing with time-varying problems. Compared with RNN, LSTM has been improved in two main aspects. First, in order to solve the limitation of information forgetting, the cell state is split into the short-term state h t and the long-term state c t . Second, the cell states are regulated by three control gates, the forget gate, the input gate, and the output gate [38]. The architecture of LSTM can be described in Figure 1.
in probability similarity measuring and achieved good results [34]. Firstly, let , which can be calculated by Equations (1) and (2). Finally, the difference between the two probability models can be calculated by Equation (8). In this paper,

Long Short-Term Memory Networks
LSTM model based recurrent neural network (RNN) can adaptively learn the representative information through multiple non-linear transformations [35][36][37]. Compared with the traditional ANN, LSTM can remember all the historical information entered and is suitable for dealing with time-varying problems. Compared with RNN, LSTM has been improved in two main aspects. First, in order to solve the limitation of information forgetting, the cell state is split into the short-term state t h and the long-term state t c . Second, the cell states are regulated by three control gates, the forget gate, the input gate, and the output gate [38]. The architecture of LSTM can be described in Figure 1. A typical LSTM is illustrated in Figure 1, and the hidden layer contains three gates: forget gate, input gate, and output gate. The functions of these three gates are: information forgetting, long-term state updating, and short-term state updating.
1. Information forgetting. The states removed from the previous long-term state  A typical LSTM is illustrated in Figure 1, and the hidden layer contains three gates: forget gate, input gate, and output gate. The functions of these three gates are: information forgetting, long-term state updating, and short-term state updating.

1.
Information forgetting. The states removed from the previous long-term state c t−1 are controlled by the forget gate f t . The f t can be described by Equation (9).
where σ is the sigmoid function, x t is the previous current time, w f is the weight vectors, b f is the bias term of the forget gate, and "·" means matrix multiplication.

2.
Long-term state updating. The input gate layer determines what values will be updated. The input gate i t and candidate value vector c t are expressed by Equations (10) and (11).
where (w i , w c ) are the weight vectors, and (b i , b c ) are bias terms. Then, the new long-term cell state c t can be obtained by Equation (12).

3.
Short-term state updating. The function of the output gate is to change the long-term state to the short-term state. Equation (13) describes the output gate o t .
Finally, the short-term state of the cell unit at time t can be described as Equation (14).
In this study, LSTM implements the prediction of sequential to point, the dimension of the input sequence is 10. The loss function is mean squared error (MSE), and the optimizer is the ADAM algorithm, which is an extension of the gradient descent algorithm.

DP Model for Fault Monitoring
Data collected by aero-engine sensors vary over time and contain noise [39]. Furthermore, aero-engines are designed based on failure-tolerance, which means that the engine will keep a healthy state in the early stage of the engine, and the influence of fault is minimal and even far lower than the time-varying influence. As time goes on, the influence of the fault becomes greater and greater until the engine is unable to function properly. Therefore, it is necessary to find a method that can not only eliminate the influence of noise but also capture the accumulation of engine fault. Generally, the operating state of the engine cannot be directly reflected by sensor data at a certain moment. Deriving the engine state from the physical meaning of the data itself is difficult and complex. Therefore, the core idea of the proposed framework is to construct the probability distribution of engine life cycle data, which is dynamically updated. Different health states necessarily correspond to different probability distributions. Double probability models are constructed to represent the engine health state and the health monitoring state, respectively, and the monitoring probability model must be updatable so as to reveal the progressive variation trend. Once the double probability models are constructed, the engine fault can be quantified by comparing the difference between the two probability models. It should be noted that this method is proposed on the assumption that the time-varying influences of the two states are the same.
In addition, the DP model is designed as standardized architecture. In the PHM field, some model-driven methods such as Kalman Filtering, particle filter [40], and so on are all aimed at fixed objects. When the engine model is different, the PHM model needs to be modified. Some data-driven methods such as ANN, SVM, and so on also have limited generalization capability [14,15]. In contrast to these methods, the DP model can be designed as standardized architecture that is suitable for different engine models, because the DP model is updated dynamically with the accumulation of engine data, so it does not require much prior knowledge or complex model parameter adjustment, and considering the inevitability of data transfer in the framework, the proposed PHM framework in this paper will use a generalized data interface between the parts. In addition, the input and output data of the framework are normalized. Another significant advantage of the DP model is that it does not need training as ANN does, so this method is more efficient than ANN.

Combining the DP Model and LSTM for the PHM Framework
The modular hierarchical structure is a prominent feature of the proposed PHM framework, and the framework contains four blocks, as shown in Figure 2. The first step of the framework is to obtain sensors data for the entire life cycle and RUL information of the engine, which is large and contains noise. Therefore, in this step, it is necessary to clean these sensors' data and reduce their dimensions through the PCA technique [41]. Then, the data after dimension reduction should be standardized. The preprocessed data is divided into baseline data and monitoring data, which are passed into Block 1 and Block 2, respectively. These two blocks constitute the double probability models. Block 1 constructs the baseline probability model based on the baseline data; that is, the data under the engine health state. For the same engine, the baseline probability model remains unchanged and is updated for different engines. Block 2 constructs the monitoring probability model based on real-time monitoring data, which needs to be updated in real-time. After the construction of the double probability models, the difference (Di f f (ζ B , ζ O ), where ζ B and ζ O are the parameters of baseline and monitoring probability models, respectively) between the two models can be used to evaluate the degree of engine failure, and that is what Block 3 does. In this paper, the normalized Di f f (ζ B , ζ O ) are used as fault detection indexes. Block 4 is the second part of the PHM framework-RUL estimation. The large amount of fault detection index data generated by the fault diagnosis module is taken as the training sample of the LSTM network. In this way, the interference of sensor data noise can be avoided. In addition, since the probability model contains the information of the entire data set, it is difficult for the fault detection indexes to be disturbed by a very small number of abnormal data. Therefore, the framework combining the two models has better stability. RUL prediction can be started from any time of different engines. A threshold value can be selected to conduct RUL evaluation according to the fault detection index curve.

Results and Discussions
In order to further evaluate the PHM model, a turbofan engine performance degradation dataset, which is generated by commercial modular aero-propulsion system simulation (C-MAPSS) [42], is utilized. Each example within the turbofan dataset is a time se-

Results and Discussions
In order to further evaluate the PHM model, a turbofan engine performance degradation dataset, which is generated by commercial modular aero-propulsion system simulation (C-MAPSS) [42], is utilized. Each example within the turbofan dataset is a time series signal of various sensor data and operating conditions data which is measured periodically over the life-cycle of the turbofans [43].

Data Sets Characterization
As shown in Figure 3, a turbofan engine normally includes a fan, low pressure compressor (LPC), low pressure turbine (LPT), high pressure compressor (HPC), high pressure turbine (HPT), combustor, and a nozzle. The C-MAPSS data sets are multiple multivariate time series. Each dataset has been partitioned into training and test sample sets. Each dataset (i.e., a 24-element vector) includes 21 characteristic sensors for engine health data recording. With the preprocessing method, 14 sensors that are currently available onboard for many commercial turbofan engines are selected for PHM in this study [44]. Table 1 shows the description of selected sensors.
Aerospace 2022, 9, x FOR PEER REVIEW onboard for many commercial turbofan engines are selected for PHM in this stu Table 1 shows the description of selected sensors.  After the raw data is selected, the Z-score method is used to standardize the 1  After the raw data is selected, the Z-score method is used to standardize the 14 sensor parameters. The Z-score actually reflects the relative standard distance from an element to the mean. It can be calculated as: where z is the z-score, x is the value of the element, µ is the population mean, and σ is the standard deviation.
In this paper, four datasets (Engine #1-#4) are selected to validate the DP model, and 80 datasets (60 datasets as training samples and 20 datasets as testing samples) are selected to validate the LSTM model.

Fault Diagnosis
This section corresponds to Block 1, Block2, and Block3 in the PHM framework diagram. In Section 4.1, a high dimensional dataset containing 14 sensor parameters was obtained. Because of the limitation of DP model in processing high-dimensional data, PCA is used to construct a two-dimensional FMF. The data of the four engines processed by PCA is shown in Figure 4. is used to construct a two-dimensional FMF. The data of the four engines processed by PCA is shown in Figure 4.  As can be seen in the figure, the red dots represent the data of the top 25% cycles, which are very concentrated, and there is little difference in the first principal component among the data points. Based on experience, we can assume that the engine is in a healthy state for the top 25% cycles, and the data of top 25% cycles is considered to be baseline data. Here, 25% is a conservative estimate and does not mean that the engine will fail after As can be seen in the figure, the red dots represent the data of the top 25% cycles, which are very concentrated, and there is little difference in the first principal component among the data points. Based on experience, we can assume that the engine is in a healthy state for the top 25% cycles, and the data of top 25% cycles is considered to be baseline data. Here, 25% is a conservative estimate and does not mean that the engine will fail after 25%. Figure 4 also tells us that the data for the after 75% cycles of the engine is heavily dispersed, which means that the operating data of the engine during this period has gradually deviated from the data of the health state.
After the preprocessing of the original data is completed, the initial classification of these data can be achieved by the ADPC algorithm, so as to obtain the initial values of ζ required by the GMM. Although the ADPC algorithm can adaptively identify clustering centers and cut-off distance, the research in this paper finds that the method has limitations when dealing the sample sets with small sizes. Therefore, in the early stage of engine operation, the sample size is still small, and a limiter is added to the ADPC algorithm to keep the number of clustering centers and cut-off distance unchanged. Therefore, a fixed number of cluster centers and cut-off distance are used for the top 50% of the engine full life cycles. In addition, the number of cluster centers is set between the interval [2,6], and the difference between the number of cluster centers of two adjacent samples cannot be more than two. The idea is to prevent violent oscillations in rare cases. The above measures can ensure the accuracy and stability of the established model. The variation of the number of clustering centers in the full life cycles is shown in Figure 5. This figure reflects that the number of GCs recognized by the ADPC is changing adaptively to the changing monitoring feature space along with the engine life cycle. After the initial clustering of the original data using the ADPC algorithm is completed. The initial values ζ can be determined, and the EM algorithm is used to build the GMM. The implementation process is shown in Figure 6.
The GMM model for the health state and monitoring state need to be constructed. After the initial clustering of the original data using the ADPC algorithm is completed. The initial values ζ can be determined, and the EM algorithm is used to build the GMM. The implementation process is shown in Figure 6.
Aerospace 2022, 9, x FOR PEER REVIEW 12 of 2 the same. Since the initial value and total life cycles of each engine are slightly differen (this is a characteristic of the C-MAPSS data set itself), the four curves do not completely coincide in the early stage, but they tend to coincide very well in the later stage. And al four engines have almost the same fault detection index at the end of the cycle. These results are quite consistent with the real failure evolution law of engine.   The GMM model for the health state and monitoring state need to be constructed. This kind of DP model is also called the dynamic double probability model. Among them, data from the top 25% of the engine life cycles is used to construct the baseline probability model, and this model remains unchanged in the process of engine fault diagnosis. Data from after 75% of the engine life cycles is used to construct the monitoring probability model, which is continuously updated with the increase of the engine life cycles. In the fault diagnosis stage, the most important thing is to get the engine fault detection indexes, as is shown in Figure 7. In the probability difference measuring method, the number of Monte Carlo samples is R = 10,000. Table 2 shows the relevant parameters of the four engines and the fault detection indexes in case of engine failure. It can be seen from the figure that the fault detection indexes of the top 25% cycles are zero. This is because the engine is in a healthy state at this stage and failure monitoring is not carried out. In the fault monitoring stage, the fault detection index's variation trend of the four engines is basically the same. Since the initial value and total life cycles of each engine are slightly different (this is a characteristic of the C-MAPSS data set itself), the four curves do not completely coincide in the early stage, but they tend to coincide very well in the later stage. And all four engines have almost the same fault detection index at the end of the cycle. These results are quite consistent with the real failure evolution law of engine. In order to verify the superiority of the proposed model, BP and DBN models are used as comparison, among which the BP model is a classic algorithm, whereas the DBN model is a new and effective method used for engine fault diagnosis in recent years. Figure 8 indicates the analysis results of five samples, which are also from the C-MAPSS data (The relevant data of BP and DBN models are from reference [1]). The results show that the fault detection indexes obtained by BP or DBN models oscillate violently.   In order to compare the effect of the model more specifically, the first-order difference of the predicted fault detection indexes and the corresponding variance value are obtained, as shown in Figure 9. The variance of the proposed DP model is 0.015, whereas the variance of the BP and DBN models are 0.035 and 0.024, respectively, as shown in Table 3. Obviously, the proposed DP model has lower difference variances and better fault diagnosis results compared with the BP model and DBN model. Unlike the DBN and other ANN methods, the key to the DP model is to construct the probability distribution of engine data set in a specific space, which is the statistical result of a large number of data. Therefore, the DP model has the ability to integrate historical data and current data, and its stability is bound to be better. In order to compare the effect of the model more specifically, the first-order difference of the predicted fault detection indexes and the corresponding variance value are obtained, as shown in Figure 9. The variance of the proposed DP model is 0.015, whereas the variance of the BP and DBN models are 0.035 and 0.024, respectively, as shown in Table 3. Obviously, the proposed DP model has lower difference variances and better fault diagnosis results compared with the BP model and DBN model. Unlike the DBN and other ANN methods, the key to the DP model is to construct the probability distribution of engine data set in a specific space, which is the statistical result of a large number of data. Therefore, the DP model has the ability to integrate historical data and current data, and its stability is bound to be better. ence of the predicted fault detection indexes and the corresponding variance value are obtained, as shown in Figure 9. The variance of the proposed DP model is 0.015, whereas the variance of the BP and DBN models are 0.035 and 0.024, respectively, as shown in Table 3. Obviously, the proposed DP model has lower difference variances and better fault diagnosis results compared with the BP model and DBN model. Unlike the DBN and other ANN methods, the key to the DP model is to construct the probability distribution of engine data set in a specific space, which is the statistical result of a large number of data. Therefore, the DP model has the ability to integrate historical data and current data, and its stability is bound to be better.   It is expected that the dynamic double probability model is able to capture the manifold of the healthy state and map differences between degradation trajectories into different regions of 2D FMF space. This visualization is given in Figure 10 using the first two principal components combined with the fault detection indexes. As can be seen from the figure, blue data points representing engine health status are mainly concentrated around PCA-1 = −3. As PCA-1 increases, the value of fault detection indexes also increases. The fault monitoring index reaches the maximum at about PCA-1 = 10 for all four engines, which means engine failure. It is clear that the DP model can well identify the evolution process of engine failure.

RUL Estimation
The DP + LSTM model is applied for RUL estimation. It is necessary to select appropriate parameters for LSTM models to avoid local optimum and fitting errors. As a matter of experience, the optimal parameter combinations of the LSTM model are shown in Table 4. The 80 representative engines in the C-MAPSS dataset are used to verify the reliability of the LSTM model, in which the training and test subsets are divided into a ratio of 3:1. The training data of LSTM is the fault detection indexes for each engine.  [1,1] ferent regions of 2D FMF space. This visualization is given in Figure 10 using the first two principal components combined with the fault detection indexes. As can be seen from the figure, blue data points representing engine health status are mainly concentrated around PCA-1 = −3. As PCA-1 increases, the value of fault detection indexes also increases. The fault monitoring index reaches the maximum at about PCA-1 = 10 for all four engines, which means engine failure. It is clear that the DP model can well identify the evolution process of engine failure.

RUL Estimation
The DP + LSTM model is applied for RUL estimation. It is necessary to select appropriate parameters for LSTM models to avoid local optimum and fitting errors. As a matter of experience, the optimal parameter combinations of the LSTM model are shown in Table   Figure 10. Two-dimensional PCA plot of the fault detection index.
To verify the superiority of the proposed method, the RNN and gated recurrent unit (GRU) network, which is a variant of LSTM, are implemented as comparisons [1], Mean absolute error (MAE) is used as a training loss function, and the MAE values of the three models are shown in Figure 11. The results show that the training loss of these three models decreases gradually with the increasing epoch. When the epoch reaches 100, the training loss of LSTM is lower than that of RNN but higher than that of GRU. During the last 20 epochs, the mean loss is 0.028. Figure 12 plots the prediction result of four testing sets from 60% and 70% of the monitoring cycles. As can be seen from the figure, the predicted results are in good agreement with the actual results. Especially near the cycle of engine failure, the actual value is highly coincident with the predicted value. High precision prediction can be achieved whether the prediction starts from 60% or 70% of the monitoring period. LSTM is a time series prediction model, and the prediction ability it has learned does not include the prediction after engine failure. Therefore, when the prediction curve tends to be stable, it means that the engine is about to fail. In addition, the prediction curve flattens out after the failure point and shows little growth. These prove the reliability and accuracy of the DP model and LSTM model proposed in this paper. The threshold needs to be set for RUL estimation since the initial state of each engine in the C-MAPSS data set is different, and the threshold value will vary slightly. The threshold value of the four engines selected in Figure 12 can be set to about 0.55.
To verify the superiority of the proposed method, the RNN and gated recurrent unit (GRU) network, which is a variant of LSTM, are implemented as comparisons [1], Mean absolute error (MAE) is used as a training loss function, and the MAE values of the three models are shown in Figure 11. The results show that the training loss of these three models decreases gradually with the increasing epoch. When the epoch reaches 100, the training loss of LSTM is lower than that of RNN but higher than that of GRU. During the last 20 epochs, the mean loss is 0.028.  Figure 12 plots the prediction result of four testing sets from 60% and 70% of the monitoring cycles. As can be seen from the figure, the predicted results are in good agreement with the actual results. Especially near the cycle of engine failure, the actual value is highly coincident with the predicted value. High precision prediction can be achieved whether the prediction starts from 60% or 70% of the monitoring period. LSTM is a time series prediction model, and the prediction ability it has learned does not include the prediction after engine failure. Therefore, when the prediction curve tends to be stable, it means that the engine is about to fail. In addition, the prediction curve flattens out after the failure point and shows little growth. These prove the reliability and accuracy of the DP model and LSTM model proposed in this paper. The threshold needs to be set for RUL estimation since the initial state of each engine in the C-MAPSS data set is different, and the threshold value will vary slightly. The threshold value of the four engines selected in Figure 12 can be set to about 0.55. The engine cycles corresponding to the threshold can be determined according to the prediction curve; that is, the cycle when the failure is predicted. To get a more detailed understanding of the model's accuracy, we calculated the relative error of prediction for 20 testing sets, as shown in Figures 13 and 14. When predicted from 60% cycles, the mean relative error of the testing is 0.024%. When predicted from 70% cycles, the mean relative The engine cycles corresponding to the threshold can be determined according to the prediction curve; that is, the cycle when the failure is predicted. To get a more detailed understanding of the model's accuracy, we calculated the relative error of prediction for 20 testing sets, as shown in Figures 13 and 14. When predicted from 60% cycles, the mean relative error of the testing is 0.024%. When predicted from 70% cycles, the mean relative error of the testing is 0.019%. Obviously, the prediction accuracy is slightly higher when starting from 70% cycles, because time series prediction models generally have a certain degree of cumulative error. In general, the relative errors of both of them remain below 6%, which proves the high accuracy of the proposed model.  Several classical RUL estimation methods are compared to verify the superiority o the proposed method, and the RUL prediction errors of the five models are listed in Table   Several classical RUL estimation methods are compared to verify the superiority of the proposed method, and the RUL prediction errors of the five models are listed in Table  5 (relevant data of the model used for comparison come from the reference [1]). Compared with DBN + LSTM, LSTM, RNN, and GRU, the average RUL estimation error of DP + LSTM model is 4.4, which decreases by 21%, 41%, 51%, and 48% (the data of these five Several classical RUL estimation methods are compared to verify the superiority of the proposed method, and the RUL prediction errors of the five models are listed in Table 5 (relevant data of the model used for comparison come from the reference [1]). Compared with DBN + LSTM, LSTM, RNN, and GRU, the average RUL estimation error of DP + LSTM model is 4.4, which decreases by 21%, 41%, 51%, and 48% (the data of these five models are all from the C-MAPSS dataset). The result shows that proposed DP + LSTM model has higher accuracy than those classical time series prediction models. In fact, several other methods belong to the ANN model, which can also be called a black box model. In essence, they achieve prediction by learning the inherent laws of a large amount of data. These methods are sensitive to data, and the hyper-parameters have a great impact on the model effect, and the adjustment of hyper-parameters is a complex process. The DP + LSTM method proposed in this study is the combination of probability model and ANN model. Solid mathematical basis is the advantage of probability model, which is an important factor for the DP + LSTM model to be more superior.

PHM Application Example
Standardizing the data processing flow of the PHM framework is one of the aims of this study. Algorithm 1 summarizes the function realization process of the PHM framework. An engine data set in the C-MAPSS data set is selected to show the processing results of the proposed PHM framework, as shown in Figure 15a, which shows 7 of the 14 sets of raw sensor data for the engine. It can be seen that noise greatly interferes with sensor data, and the change trends of sensors are inconsistent in the whole life cycle of the engine. Figure 15b shows the data after dimension reduction. Figure 15c,d, respectively, show the results of fault diagnosis and RUL estimation respectively. The proposed framework realizes data analysis and mining from the original data of the engine to monitor engine health and realize the estimation of RUL.

Conclusions
In this study, a PHM framework combining the DP model and LSTM model is proposed for fault diagnosis and RUL estimation of aero-engine. Firstly, the DP model consisting of a baseline probability model and a monitoring probability model is constructed, in which the baseline probability model reflects the operating characteristics of the engine's healthy state, and the monitoring probability model reflects the failure occurrence and evolution process of the engine. A GMM-ADPC algorithm is employed for modeling engine fault development, and the PCA method is adopted to reduce the dimension of the input data. Secondly, the probability difference measuring method is used to quantify the difference between the two probability models so as to obtain the fault detection indexes. Thirdly, the DP + LSTM model is introduced for a time series prediction of fault detection indexes, so as to estimate the RUL of the engine. Finally, the PHM framework is established by integrating the aforementioned models. The experimental results on the degradation datasets obtained by the C-MAPSS indicated that the proposed DP model can capture the process of engine failure well, and the DP + LSTM model can perform RUL estimation well. By comparing the results of the proposed method with some classical methods, it is shown that the proposed method has better stability and accuracy.
To sum up, the PHM framework proposed in this study can adequately realize the functions of fault diagnosis and RUL estimation.

Conclusions
In this study, a PHM framework combining the DP model and LSTM model is proposed for fault diagnosis and RUL estimation of aero-engine. Firstly, the DP model consisting of a baseline probability model and a monitoring probability model is constructed, in which the baseline probability model reflects the operating characteristics of the engine's healthy state, and the monitoring probability model reflects the failure occurrence and evolution process of the engine. A GMM-ADPC algorithm is employed for modeling engine fault development, and the PCA method is adopted to reduce the dimension of the input data. Secondly, the probability difference measuring method is used to quantify the difference between the two probability models so as to obtain the fault detection indexes. Thirdly, the DP + LSTM model is introduced for a time series prediction of fault detection indexes, so as to estimate the RUL of the engine. Finally, the PHM framework is established by integrating the aforementioned models. The experimental results on the degradation datasets obtained by the C-MAPSS indicated that the proposed DP model can capture the process of engine failure well, and the DP + LSTM model can perform RUL estimation well. By comparing the results of the proposed method with some classical methods, it is shown that the proposed method has better stability and accuracy.
To sum up, the PHM framework proposed in this study can adequately realize the functions of fault diagnosis and RUL estimation.