Prediction of Aero-Engine Remaining Useful Life Combined with Fault Information

: Since the fault information of an aero-engine is very important for the remaining useful life of an aero-engine, the paper proposes to combine the fault information for the remaining useful life prediction of an aero-engine. Firstly, we preprocessed the signals of the dataset. Next, the preprocessed signals were used to train a CNN (convolutional neural network)-based fault diagnosis model and obtain fault features from the model. Then, we combined BIGRU (bidirectional gated recurrent unit) and the fault features to predict the remaining useful life of the aero-engine. We used the CMAPSS (commercial modular aviation propulsion system simulation) dataset to verify the ef-fectiveness of the proposed method. After that, comparison experiments with different parameters, structures, and models were conducted in the paper.


Introduction
Aero-engine accidents will lead to casualties and irreversible serious consequences. To prevent accidents, we must make timely and effective predictions of the remaining useful life of aero-engines.
RUL (remaining useful life) prediction methods are generally divided into the model-based method, data-driven method, and hybrid method (the combination of the former two methods). For example, Jiao [1] first used two LSTM (long short-term memory network) to extract two features from monitoring data and maintenance data, respectively, and then stacked the two features and sent them to the full connection layer to obtain the health index, and then built the state space model of the health index and obtained the RUL through extrapolation. The PSW (phase space warping) describes the dynamic behavior of the bearing tested on the fast time scale. As a physical-based model, the Paris crack propagation model describes the defect propagation of the bearing on the slow time scale. Qian [2] completed the RUL prediction of the bearing by combining the enhanced PSW with the modified Paris crack propagation model and comprehensively used the information of the fast time scale and the slow time scale. Because the complex working conditions and internal mechanisms hinder the construction of physical models, it is difficult to implement model-based experimental RUL prediction. Since the datadriven method only needs to use historical monitoring data, the data-driven method is receiving more and more attention. In recent years, due to the rapid development of big data and computing power, artificial intelligence has been paid more and more attention and is widely used in RUL prediction.
A variety of artificial intelligence methods have been applied to predict the remaining useful life. Manjurul Islam [3] defined a degree of defect (DD) metric in the frequency domain and inferred the health index of the bearing. Then, according to the health index and the least squares support vector machine, the start times (TTS) point of RUL prediction was obtained, and then the RUL of the bearing was obtained by using the cyclic least squares support vector regression (recurrent LSSVR). Yu [4] used the multi-scale residual temporal convolutional networks (MSR-TCN) to extract the information of multiple scales to more comprehensively analyze health status, and combined this with the attention mechanism to avoid the impact of low correlation data in the prediction process to carry out the engine RUL prediction. Zhang [5] selected 14 sensor signals as the original signals, and through the multi-objective evolutionary ensemble learning method, evolved the multiple DBN (deep belief network) at the same time and took accuracy and diversity as two conflicting goals. After that, the final diagnosis model was obtained by combining multiple DBNs and achieved better results than several different models.
Because the number used in RUL prediction is mostly time-series data and RUL is also time-series related, RNN (recurrent neural networks) with stronger processing ability for time-series data are widely used in RUL prediction. Zheng [6] obtained the health factors by feature selection and PCA, and then combined the health factors and label input LSTM to predict the remaining useful life. Wu [7] selected the sensor data by using monotonicity and correlation, and then completed the prediction of the remaining useful life of the aero-engine by combining the LSTM optimized by the grid search algorithm. Peng [8] used VAE-GAN (variational autoencoder-generative adversarial networks) to generate the health index of the current state, and then used BLSTM (bidirectional long shortterm memory) to generate the future sequence sensor data, and then obtained the health index according to the current state and the future state and extrapolated it to obtain RUL.
A variety of methods are adopted to improve the accuracy and speed of remaining life prediction. For example, the accuracy and speed of remaining useful life prediction are improved through some network structure changes [9][10][11]. However, it is not easy to change the network structure according to the appropriate problems to improve the performance. It is easier to improve the prediction accuracy by enriching the state information. Therefore, many methods use a simpler way of enriching state information to improve prediction accuracy. Various approaches are used to enrich the state information and thus improve the prediction accuracy, such as extracting multiple features [12][13][14][15][16], extracting multi-channel features [17][18][19][20], extracting both spatial and temporal features, extracting multi-scale features [21][22][23], and considering the temporal and spatial dependence of sensors [24].
Since different faults will lead to different degradation patterns, the fault features as important state information are very import for the remaining useful life prediction accuracy. Considering that different faults will lead to different degradation modes, Xia [25] established a model based on the state data under each fault state and then used the outputs of the models of multiple degradation modes to obtain the final result. Cheng [26] used two outputs of a transferable convolutional neural network (TCNN) to obtain the fault mode and RUL, respectively. Chen [27] proposed that the degradation pattern of bearings should be classified into slow degradation and fast degradation according to RMS. Then, the BLSTM and attention mechanism were used for remaining useful life prediction. At present, the prediction of the remaining useful life of the engine combined with the fault information has not been paid enough attention. Moreover, the above method does not directly extract the fault features and enrich the fault features as independent information, which will lead to the different degradation mode information being not obvious, and thus reducing the diagnosis accuracy of different degradation modes.
In order to involve the fault features as independent information in the remaining useful life prediction, the paper first uses CNN as a fault diagnosis network to classify faults and obtain fault features from them. Then, a remaining useful life prediction model based on BIGRU and the attention mechanism is developed and combined with the fault features for remaining useful life prediction.

CNN
CNN is widely used in fault diagnosis due to its outstanding feature extraction ability and the possibility of transforming low-level features into high-level features through a multi-level structure. A typical CNN generally includes an input layer, a convolutional layer, a pooling layer, and a fully connected layer.
The input layer is used to receive raw data. The normalized raw data can improve the efficiency of the algorithm operation.
The convolution layer is used to convolve the output of the previous layer and to activate the convolved data using a nonlinear activation function to learn more advanced features. Its mathematical formula is shown in Formula (1): where 1 + l i f is the output corresponding to the k-th convolution kernel of the L-th convolution layer, C represents the number of channels' input by the convolution layer, After passing through multiple convolution layers and pooling layers, the learned features are flattened into vectors, and then the full connection layer is used to connect the extracted features with the output layer. The calculation formula of the full connection layer is shown in Formula (3): where l l x y ， 1 + are, respectively, the input and output of the full connection layer of layer L, f is the activation function, such as softmax, ReLU, etc., and l l b w , are, respectively, the weight and bias of the full connection layer of layer L.

BIGRU
In this paper, we used BIGRU to extract bidirectional temporal information of features. BIGRU has been widely used in natural language recognition and fault diagnosis [28,29]. BIGRU is composed of two independent GRU layers. The input of GRU layers is the same, but the direction of information transmission is the opposite. Compared with the standard GRU, BIGRU can comprehensively consider the historical and future information, thus enhancing the prediction ability. GRU is a variant of RNN. By introducing a gating mechanism to adjust the path of information flow, GRU can effectively solve the problem of gradient explosion in RNN. Moreover, compared with LSTM (another variant of RNN), which can also effectively solve gradient explosion, GRU has higher accuracy and efficiency in predicting the remaining useful life of aero-engines [30]. The structure of the GRU unit is shown in Figure 1.    (4): is the concat of two vectors, and ( ) • σ is the sigmoid function. The update gate Z is used to control the proportion of historical information used during the calculation. Similar to the reset gate t r , the larger the value of the update gate t z , the larger the amount of historical information of the loop block used. The calculation formula is as follows (5): where z W is the weight of t z and z b is the bias of the update gate.
The calculation formula of the hidden candidate state h is as follows (6): where h W~ is the weight of t h and h b~ is the bias that hides the candidate state. • represents the multiplication of elements. The calculation formula of the hidden state at the current time is as follows (7): In BIGRU, the same input data are fed to the forward GRU and the backward GRU.
It simultaneously calculates t h  (forward GRU hidden state) and t h  (backward GRU hidden state) at each time step, and then connects the two hidden states for the next calculation. The BIGRU structure is shown in Figure 2.

Attention
The attention mechanism is a method to quickly select important information from a large amount of information by imitating human attention. Since the prediction of the remaining useful life of an aero-engine requires more information, we need the attention mechanism to select important information from a large amount of information. When the input sequence of LSTM/GRU model is long, it is difficult to obtain the final reasonable vector representation. It assigns different attention weights to the feature vectors to distinguish the importance of features and improve the accuracy of prediction. The attention mechanism calculation formula is shown in Formulas (8)-(11):

W W，
is the weight of the two fully connected layers.

Proposed Methodology
Since different fault states correspond to different degradation patterns, information on fault states is particularly important for remaining useful life predictions. Since previous studies did not directly involve the fault features as independent information in the remaining useful life prediction, the paper proposes to first construct a fault diagnosis model using CNN and obtain the fault features from it. Then, the remaining useful life prediction model based on BIGRU and attention is developed and combined with the fault features to predict the remaining useful life of the engine. The remaining useful life prediction steps combined with fault information are as follows: • Data preprocessing: We selected 14 sensors related to the degradation trend from 21 sensor signals. Then, the selected 14 sensor signals were normalized so that the influence of the data on the results is on the same scale. After that, the standardized data were further processed by the sliding window method to obtain the sample signal.
According to the previous experience, the window length and the step size were, respectively, selected as 30 and 1. According to the expert's suggestion, the remaining useful life was converted into the segmented remaining useful life, and the maximum remaining useful life is 125. It was assumed in the paper that when the engine is in the linear degradation stage, the data used for fault diagnosis and remaining useful life prediction are in the linear degradation stage. The two types of faults of FD001 and FD003 in the linear degradation stage were assigned different fault labels and combined with the corresponding data to obtain the original fault diagnosis model data. The corresponding sensor data in the linear degradation stage were combined with the remaining useful life labels to obtain the remaining useful life prediction model data. The flow chart of the proposed method for predicting the remaining useful life of an aero-engine combined with fault information is shown in Figure 3.

Dataset
The dataset used in this paper was the NASA CMAPSS (commercial modular aviation propulsion system simulation) dataset [31]. CMAPSS has been widely used in RUL prediction research of turbofan engines. There are four subsets (FD001, FD002, FD003, and FD004), and each subset records the degradation data of the turbofan engine under different fault modes. We verified the validity of the proposed method using subsets FD001 and FD003. The training set for both FD001 and FD003 contained run-to-failure monitoring data streams for 100 engines of the same type. Their test sets contain the same type of data of the same number of engines. There were one and two faults in the operation of datasets FD001 and FD003, respectively. The length of the condition monitoring data were inconsistent between one engine and another, and it was polluted by the sensor noise, making it a challenging task to predict RUL (in the unit of the operating cycle). The dataset description and the detailed description of the sensor are shown in Tables 1 and 2.

Sensor Selection
Each engine corresponds to a series of data points sampled by 21 sensors over its life cycle. Of the 21 sensors, some have a constant output over the life of the engine and do not provide any useful information for remaining life prediction. Therefore, as conducted in [5,32], we eliminated the outputs of these sensors from the C-MAPSS dataset. Therefore, we finally selected 14 features in the C-MAPSS dataset, corresponding to the outputs of 14 sensors, with indexes of 2, 3,4,7,8,9,11,12,13,14,15,17,20, and 21.

Piecewise Remaining Useful Life
Since the engine works stably and linearly in the early stage, it stops working until the system fails. Here, we used the usual label processing method, i.e., the piecewise linear label, which assigns a constant value to the target label of the early monitoring signal (for the C-MAPSS dataset, 125 was used as the constant RUL label) [33][34][35]. Zheng [36] limited the RUL of the engine from start-up to degradation to within RULmax. The linear degradation of an aero-engine occurs after RULmax. In this work, RULmax was set to 125.

Intercepting Linear Degradation Stage Data
Since a fault occurs in the dataset when the operation reaches a certain point in time, only the data from the linear degradation phase where we believe the engine has experienced the fault described in the dataset were taken in the paper. The two fault modes of FD001 and FD003 were assigned two different fault labels to form the fault diagnosis data. The remaining useful life prediction data were then formed using the corresponding remaining useful life labels. Taking the first engine of FD001 as an example, the linear degradation stage dataset RUL is shown in Figure 4.

Data Standardization
Standardization of data before input into the model can greatly improve the accuracy and efficiency of the model. In order to facilitate the application of data in the model, standardization was carried out according to the formula to remove the influence of different dimensions in the linear degradation stage data. The standardization formula is shown in (12) The dataset obtained after processing the collection of FD001 and FD003 training sets by the above steps was divided into training and validation sets in the ratio of 8:2. Meanwhile, the dataset obtained after processing the collection of FD001 and FD003 test sets according to the above steps was used as the test set of the proposed method.

Evaluation indicators
RMSE (root mean square error) is used to measure the deviation of the predicted value from the true value. The smaller the RMSE value, the closer the true value is to the predicted value. Since it is often used to evaluate the CMPASS dataset, RMSE was used as the final evaluation metric in the paper. The formula of RMSE is shown in (13)

Fault Diagnosis Model Results
In order to extract fault information for remaining useful life prediction, the paper constructed a CNN-based fault diagnosis network to extract fault features. The output features of the fault diagnosis model flatten layer were used as fault information for the next step of remaining useful life prediction.
The parameters of the fault diagnosis model are shown in Table 3. We validated the performance of the model using the test sets. The confusion matrix of the test results of the fault diagnosis model on the test set is shown in Figure 5. The horizontal axis represents the prediction label of the test sets. The vertical axis represents the real label of the test sets. Additionally, the main diagonal represents the correct number of samples predicted by the model. It can be seen that the test accuracy of the model reaches 100%. The generalization ability of the model is verified. The performance of the model is verified.

Comparative Experimental Analysis of the Number of Convolution Kernels
To verify the reasonableness of the number of convolutional kernels, comparison experiments were conducted on four numbers of convolutional kernels, 2, 4, 8, and 16.
The results in Table 4 show that the highest testing accuracy of the model was achieved when the number of convolutional kernels is 16. It is also clear from the data in the table that as the number of convolutional kernels increases, both the variety of features extracted and the testing accuracy improve. The comparison of the experimental results shows that the number of convolutional kernels chosen in the paper is reasonable.

Comparative Experimental Analysis of Convolution Layers
To verify the reasonableness of the number of convolutional layers, comparison experiments were conducted on three convolutional layers, 1, 2, and 3.
From the results in Table 5, it can be seen that the test accuracy of the fault diagnosis model reaches the highest when the number of convolutional layers is 1. The reasonableness of the number of convolutional layers selected in the paper is verified.

Comparative Experimental Analysis of Convolution Activation Function
To verify the rationality of the activation function of the convolutional layer, a comparison experiment between two activation functions, 'tanh' and 'ReLU', was conducted.
From the results in Table 6, it can be seen that the highest testing accuracy of the model was achieved when the activation function is 'ReLU'. The reasonableness of the activation function selected in the paper is verified.

Prediction Results of Remaining Useful Life
Since different fault states lead to different degradation patterns, the paper constructed a CNN-based fault diagnosis model and extracted fault information from it. Then, the remaining useful life prediction model based on BIGRU and the attention mechanism was combined with the fault information for remaining useful life prediction.
The following figures ( Figure 6) show the actual degradation curves and model predicted degradation curves for the two engines selected from the test sets of FD001 and FD003, respectively. The overall RMSE of the dataset on the model was 11.046, and the minimum MSE of the model reached 0.911.

Comparison Test and Analysis of Influencing Factors of Prediction Model
The parameters of the remaining useful life prediction model are shown in Table 7.

Necessity Analysis of Bidirectional Network and Attention Mechanism
In this section, in order to verify the necessity of bidirectional networks and attention mechanisms for improving the accuracy of remaining lifetime prediction, we conducted comparative experiments of A-GRU, A-BIGRU, and BIGRU to analyze the necessity of bidirectional networks and attention mechanisms (Table 8). From the comparison of RMSE results of A-GRU and A-BIGRU, it can be seen that when the model is a bidirectional network, the RMSE of the model is lower, which means that the predicted value is closer to the real value. It can be concluded that when the network is bidirectional, the remaining useful life prediction model can combine the information from both time directions to make a more accurate prediction of the remaining useful life of the engine. The comparison of the RMSE results from A-BIGRU and BIGRU also shows that the remaining lifetime prediction accuracy is higher when the attention mechanism is added to the model. The percent improvement in the table is the percentage improvement of the method mentioned in the paper compared to the corresponding method. By comparing the experimental results, it can be concluded that the bidirectional network and the attention mechanism are necessary to improve the accuracy of the model.

Comparison of the Number of Hidden Cells of A-BIGRU Network
To verify the rationality of the number of hidden units of the A-BIGRU network selected in this paper, we conducted a comparative experiment on the number of four hidden units: 16, 32, 64, and 128.
It can be seen from Table 9 that when the number of hidden units of the model is 32, the accuracy of the model reaches the highest. When the number of hidden units is too small, it cannot provide rich information for model analysis. When the number of hidden units is too large, the redundancy of information is not conducive to network prediction. Through experimental analysis, it was found that the accuracy of the model reaches the best when the number of hidden units of the model is 32. In order to verify the necessity of fault information to improve the accuracy of remaining useful life prediction, this paper conducted a comparison experiment based on eight different network structures with or without adding fault information as the independent variable.
The RMSE results of the remaining useful life prediction model are shown in Table  10. It can be seen that the accuracy of the eight network structures increased after the fault information was added, and the most intuitive expression is the decrease in RMS value. The model used in the paper has the largest reduction in RMSE value of 1.254 after combining the fault information, which indicates that the predicted remaining useful life is closer to the actual remaining useful life. The percent improvement in the table is the percentage improvement of the method with fault information relative to the method without fault information. Therefore, it can be concluded that the fault information is important for the remaining useful life prediction. The validity of the method in the paper is verified. In order to better show the advantages of the proposed method, a comparison with several current RUL prediction methods was performed. Since the dataset used in the paper was the collection of FD001 and FD003, the results of several comparison methods were taken as the average of the experimental results of FD001 and FD003. The comparison results are shown in Table 11.

Conclusions
Predicting the remaining useful life of an aero-engine is particularly important to prevent and mitigate risks and improve the safety of life and property. Maximizing the accuracy of the prediction can also provide more reasonable opinions for engine health management and thus take more reasonable maintenance measures.
Different faults correspond to different degradation patterns. However, in the past, the remaining useful life prediction of aero-engines did not sufficiently consider the fault information and involve it as independent information in the remaining useful life prediction. To solve this problem, the paper first classified the engine fault data by CNN to obtain the fault features. Then, the remaining useful life prediction was performed by combining the fault features and the remaining useful life prediction model based on BIGRU and the attention mechanism. After that, comparative experiments of different network structures, parameters, and methods were carried out. The experimental results show that the accuracy of the model is higher, and the parameters used are reasonable. Therefore, it proves that fault information is necessary to predict the remaining useful life of aero-engines.
The data used in the paper were experimental data under the same working conditions. We did not analyze the migration learning for different operating conditions. So, we will work in the direction of remaining useful life prediction with the involvement of fault information of different operating conditions in the future.
Author Contributions: C.W. and Z.P. collected and analyzed the data; C.W. and Z.P. wrote the manuscript. R.L. edited and revised the paper. All authors have read and agreed to the published version of the manuscript.