Convolution and Long Short-Term Memory Hybrid Deep Neural Networks for Remaining Useful Life Prognostics

: Reliable prediction of remaining useful life (RUL) plays an indispensable role in prognostics and health management (PHM) by reason of the increasing safety requirements of industrial equipment. Meanwhile, data-driven methods in RUL prognostics have attracted widespread interest. Deep learning as a promising data-driven method has been developed to predict RUL due to its ability to deal with abundant complex data. In this paper, a novel scheme based on a health indicator (HI) and a hybrid deep neural network (DNN) model is proposed to predict RUL by analyzing equipment degradation. Explicitly, HI obtained by polynomial regression is combined with a convolutional neural network (CNN) and long short-term memory (LSTM) neural network to extract spatial and temporal features for efﬁcacious prognostics. More speciﬁcally, valid data selected from the raw sensor data are transformed into a one-dimensional HI at ﬁrst. Next, both the preselected data and HI are sequentially fed into the CNN layer and LSTM layer in order to extract high-level spatial features and long-term temporal dependency features. Furthermore, a fully connected neural network is employed to achieve a regression model of RUL prognostics. Lastly, validated with the aid of numerical and graphic results by an equipment RUL dataset from the Commercial Modular Aero-Propulsion System Simulation(C-MAPSS), the proposed scheme turns out to be superior to four existing models regarding accuracy and effectiveness.


Introduction
The complexity of the equipment involved in modern industry has rapidly increased in the past decades [1]. Any failure of equipment may cause catastrophic consequences [2,3], and reliability and maintenance are key for equipment [4]. Therefore, it's essential to have an effective strategy that positively coordinates scheduling and maintenance to ensure productivity, personal safety and manufacturing development [5].
Prognostics and health management (PHM) is a key technology that can guarantee the security of equipment and reduce maintenance costs [6]. As a crucial component of PHM, remaining useful life (RUL) prediction has evolved into an active research field due to its enhanced capability to determine the maintenance time [7]. RUL of equipment is defined as the length from the current time to the end of the useful life [8]. RUL prognostics approaches consist of model-based, data-driven, and fusion prognostics [9]. More particularly, model-based methods describe equipment health state by modeling the degradation evolution of physical structure. Unfortunately, model-based methods are constrained in adapting complex equipment structure [10]. Ideally, data-driven methods put to use sensor measurement data or operational data to predict RUL without foreknowledge of the physical structure and the degradation mechanism [11]. Additionally, the above-mentioned model-based methods and data-driven methods are combined into fusion prognostics methods [9]. However, very few studies about fusion prognostics methods have been reported, due to the undiscovered intricacy of the physical structure [12,13]. Actually, data-driven methods have been proven to be capable of predicting RUL in extensive research [14,15].
Over the years, since sensor measurement data are intrinsically of the time series nature, the mainstream data-driven research in the field of RUL prognostics focuses on prediction techniques based on sequence learning [8]. In previous research, a large number of machine learning approaches have been proposed, most of which construct prognostics models by analyzing correlative sensor sequential data and associating the discovered hierarchical patterns with a definite prognostics task [16]. These prediction models provide effective evidence to the manufacturers [17,18] and include, for instance, auto-regressive integrated moving average-based (ARIMA) models [19,20], hidden Markov models (HMM) [21][22][23], support vector regression (SVR) models [24][25][26], artificial neural networks (ANNs) [27,28], radial basis functions (RBFs) [28], random forest (RF) regression [29], among others [12,30].
Nevertheless, higher prediction demands make it unfeasible for those traditional data-driven methods to handle a growing number of complicated data [31]. In pursuit of better performance for RUL prognostics, a family of deep learning models has emerged with the ability of automatic feature extraction and high prognostics accuracy [32]. In the field of RUL estimation, a convolutional neural network (CNN) is utilized to obtain high-level spatial features from the raw sensor signals in [33,34]. In addition, long short-term memory (LSTM) neural networks are applied to be specialized in extracting sensor temporal information [1,8,35,36]. However, only spatial or temporal characteristics are considered in the above single deep learning model. Preferably, an emerging scheme combining CNN and LSTM (CNN-LSTM) has been suggested in recent research [37][38][39][40][41], most of which focuses on natural language processing, speech processing, video processing, and so forth. The applications of CNN-LSTM ensembles on prediction are countable [42]. Representative literature introducing this scheme to predict residential energy consumption can be found in [43], where raw sensor data are directly in use for prediction. For RUL prognostics, more features should be taken into account to describe equipment degradation. It is proven that the addition of a health indicator (HI) can produce accurate prediction results [13].
In this paper, a novel scheme based on a health indicator (HI) and hybrid deep neural network (DNN) model is proposed to predict equipment RUL. Firstly, redundant features are removed by preprocessing. Secondly, true RUL of the equipment is substituted with a piece-wise function and preselected data are transformed into an HI by polynomial regression. Thirdly, the spatial-temporal features are sequentially extracted by a convolutional neural network and multilayer LSTM from the preselected data and the HI. Furthermore, RUL is predicted by a multilayer fully connected neural network. Finally, the effectiveness and accuracy of the proposed scheme are validated with the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) dataset [12,44] of equipment degradation.
The main contributions are summarized as follows: (1) We firstly utilize variance threshold detection methods to select valid features from the raw sensor data in the field of RUL prognostics.
(2) We expertly design an input strategy for the coordination of preselected sensor data and HI, which depicts equipment degradation.
(3) We propose a novel scheme based on a HI and hybrid DNN to efficaciously predict equipment RUL collected in Turbofan Engine Degradation Simulation.
(4) We achieve higher performance in accuracy compared with the four existing models in various metrics.
The remainder of the paper is organized as follows. Section 2 defines the prognostics problem. In Section 3, we introduce the methodology, including the health indicator, hybrid deep neural networks, data preprocessing methods, and evaluation metrics. In Section 4, we describe the implementation of the proposed scheme in detail. Then in Section 5, we describe the dataset and present the test results thoroughly by conducting groups of experiments. The last section concludes this paper.

Problem Definition
Prognostics, defined in 2004 by the International Organization for Standardization (IOS), is "an estimation of time to failure and risks of one or more existing or future failure modes". In this work, we aim to build a learning model to predict equipment RUL based on sensor measurement data, e.g., vibration sensor data. The multivariate time series data collected are expressed as where x i (j) ∈ R n represent the ith sensor measurement data of the jth sequence, and y i (j) is the corresponding number of equipment operation cycles.
The equipment performance variation tendency is to seek the variation pattern associated over time through the raw data [1]. It is defined as Assume that the total operation cycles of the jth sequence is T j . The RUL i j is written as where R th represents the threshold of the RUL, which will be explained in Section 3.1. Equations (2) and (3) indicate that RUL i j is dependent on the equipment's previous state and sensor data variation. Our goal is to learn a non-linearity function (Equation (4)) and minimize the error (Equation (5)) between the prediction value and the true RUL.

Methodology
In this section, a piece-wise linear RUL function expressing the equipment's true RUL and the HI expressing the equipment state is introduced. Then, CNN and LSTM are described in detail, and the hybrid prognostics scheme is proposed. Subsequently, data preprocessing methods are discussed and various evaluation metrics are employed.

Piece-Wise Linear RUL and HI
The performance degradation of equipment may result in changes in sensor measurements, which are rewarding for the application of prognostics techniques. As shown in Figure 1, sensor variation tends either to rise or fall over time in a sequence. In practice, during the entire operation cycle of equipment, there are two phases: the first phase is normal performance, illustrating a relatively flat trend; the second phase is degradation performance, illustrating an approximately exponential dropping trend.
Consequently, the RUL is difficult to predict in the preliminary state. We presume that the RUL is a constant R th until it straddles the critical point R th in the first phase. While in the second phase, we define that RUL is represented by a linear function. Hence, the entire RUL curve is identified as a piece-wise linear function, as shown in Figure 2. Furthermore, to characterize equipment health condition, there are two methods: physical parameters [45] and stall margins [44]. Physical parameters are usually too complex to obtain. Effectively, stall margins can satisfy the target of better showing the equipment failure state by transforming the sensor measurement data to a one-dimensional HI.
In general, the health indicator δ ∈ [0, 1], with 0 and 1 corresponding to equipment failure and an intact state, respectively. According to the regularities of RUL distribution regarding the dataset collected, the RUL threshold is ascertained from the predefined critical point R th . Then, the health indicator δ of the jth sequence is formulated as The sensor data and HI of the dataset are processed by polynomial regression. HI estimationδ H I at each time-step of the entire dataset is obtained bŷ where [α, β] = [α, β 1 , β 2 , · · · , β m ] represents m + 1 dimensional parameters regarding biases and weights. HI trajectories of the dataset are denoted as whereδ i j is the HI estimation at the ith time-step of the jth sequence.

Convolutional Neural Network
The convolutional neural network (CNN) was proposed by Fukushima [46] and is mainly aimed at pattern recognition and image processing. Indeed, CNNs possess great potential to identify the various prominent patterns of sensor measurement data and extract high-level spatial features.
Normally, the CNN architecture is composed of two types of layers [47]: convolution layers and pooling layers, shown in Figure 3. CNNs are developed to extract abstract features by alternating operations of convolution and pooling [48].
Convolution layer: In this layer, the previous layer's feature maps are convolved with convolutional kernels, which are used for feature extraction and feature mapping. Then, the feature map for the next layer is computed through a non-linear function. The output feature maps of the convolution filter are calculated by where * denotes the convolution operation, l is the lth layer neural network, and x l−1 and x l are the input and output of the convolution filter, respectively. g(·) is the activation function. z l is the input of the activation function. U l and b l represent weight matrices and additive bias vectors, respectively. Pooling layers (also known as subsampling layers): In this layer, to reduce the feature map resolution, the output feature maps of the convolution layer are subsampled by proper factors. Average pooling and max pooling are the most common pooling methods. In this paper, the max pooling method is adopted: where down(·) represents the subsample function concerning max pooling.

Recurrent Neural Network and LSTM
A recurrent neural network (RNN) is a natural feed-forward neural network that has been applied successfully owing to its capacity to model highly non-linear data with a sequential nature [49]. A standard RNN is composed of a series of recurrent units to cope with time series problems. The unfold overview of RNN cell architecture is shown in Figure 4. It's different from the general fully connected neural network, as the output at the current time is also the input of the next time.
In Figure 4, the same layer weight matrices U, V, and W are re-utilized for each step of computation throughout the sensor data, i.e., weight sharing. L(t) is the loss function; y(t) is the actual output. The symbol in the parentheses denotes the time-step for the recurrent unit. The RNN cell forward propagation is formulated as follows: where o(t) is the prognostics output of the RNN cell. h(t) is the hidden layer. σ(·) is the element-wise sigmoid activation function. tanh(·) is the element-wise hyperbolic tangent activation function. b h and b o are the bias vectors. During the training process, a back-propagation through time (BPTT) algorithm is executed [50]. It is formulated as follows: However, there will be an accumulation of derivatives of the activation function during backpropagation process. Gradient vanishing or gradient explosion may occur. Hence, improving the RNN cell internal structure is positively efficient.
The LSTM neural network was proposed by Hochreiter and Schmidhuber in 1997 [49]. LSTM is intended to address two special issues that standard RNNs can't achieve: one issue associated with the standard RNN is the "fading memory". Once the number of time-steps becomes large, the "future" time-steps will contain virtually no memory of the first inputs as there is no structure in the standard recurrent layer that individually controls the flow of the memory itself. The other issue is that an RNN model is required to confirm the window length in advance, which is formidable to automatically acquire in practical applications [51].
LSTM is a special architecture of RNN. As is shown in Figure 5, the series recurrent unit is the LSTM cell rather than a simple RNN cell. An LSTM cell possesses long-term memory, which is attributable to three gates modulating the flow of information in the LSTM cell: the input gate, forget gate, and output gate.

Input gate i(t):
It controls what information will be passed to the memory cell based on previous output and current sensor measurement data.
Forget gate f (t): It controls how the memory cell will be updated. Output gate o(t): It controls which information will be carried to next time-step. The LSTM forward propagation algorithm is given by Algorithm 1.

Hybrid Scheme
Inspired by prior investigations, CNNs can expand spatially and LSTM can expand in long-term temporary memory. If there are high-frequency sensor measurements or various sensors involved, it is superior to add a convolutional neural network layer before the LSTM layer. Convolution layers are interspersed with pooling layers to reduce computation time and to gradually build up further spatial and configural invariance. LSTM can discover long-term temporary dependency features. The fully connected neural network is good at mapping, and the hybrid DNN will eventually learn an excellent regression model to predict equipment RUL. The proposed scheme is shown in Figure 6. The objective of the proposed scheme is to feed the raw sensor data after feature extracting and nonlinear regression by multilayer networks, then the equipment RUL is obtained. We utilize a one-layer convolutional neural network that is composed of one convolution layer and one pooling layer. Meanwhile, we utilize two LSTM layers. For the fully connected neural network, we utilize a multilayer perceptron, which consists of two hidden layers and 50 neurons included in every hidden layer. The mean square error (MSE) is used as the cost function.
where RULŷ i and RUL y i are the prediction value and true RUL, respectively. N is the total number of samples in the testing set. The Adam optimizer (shown in Algorithm 2), an adaptive learning rate optimization algorithm [52], is employed to train the model. complete update: ω = ω + ∆ω 12: end while 13: return ω

Variance Threshold
The measure data is complex and multidimensional. Features with low variance are convergent, i.e., features are not obviously distinguishable during a sequence, they are ineffective against prognostics performance [53]. Features with a dataset variance lower than threshold will be removed. It is formulated as where m is the sample size andx represents the mean of the feature.

Normalization
Sensor data value scales may be diversiform. To accelerate the convergence rate, we need to normalize the process with respect to each sensor before utilizing data. Two main data normalization methods are exploited widely [54]: z-score normalization (Equation (19)) and min-max normalization (Equation (20)). Specifically, the z-score normalization method makes the sensor data follow the normal distribution, and the min-max normalization method makes the sensor data scale within the range of [0, 1].
where x std , x max , and x min are the standard deviation, maximum, and minimum with respect to each sensor, respectively.

Metrics
For the sake of evaluating the performance of the proposed scheme, some evaluation metrics of prediction performance are adopted. These are root mean square error (RMSE), mean absolute percentage error (MAPE), mean absolute error (MAE), and penalty score function. The first four metrics adopted are popularly applied in prognostics tasks. Dissimilarly, the penalty score function is given by PHM2008 competition [44] specifically for RUL prognostics evaluation. The penalty score function is asymmetric, as early prediction is preferred over late prediction. It can be seen from Figure 7 that the tolerance of advanced prognostics is greater than delayed prognostics for the same penalty score value. Evaluation metrics are formulated as follows:

Implementation
The flowchart of the proposed scheme of RUL prognostics is presented in Figure 8. Firstly, the raw C-MAPSS dataset is processed to select proper input data using the variance threshold, and the corresponding data are normalized by the z-score normalization method. Next, the preselected data are transformed into a one-dimensional health indicator. The processed datasets are split into the training set and the testing set.
Furthermore, the CNN-LSTM-NN hybrid networks to be built. Then the network parameters are initialized, which includes the number of hidden layers, the number of neurons, batch size, and so forth. The hybrid scheme takes the training set as the input, and the true RUL of the training set are used as the target outputs. The Adam optimizer is utilized to optimize the training network, with the learning rate set at 0.001 to achieve stable convergence. The number of training epochs is 500. The batch-size is set as 250.
Finally, the testing set is input to the training model for the RUL prognostics, and the evaluation values are obtained.

Experiments
In this section, we present our experimental setup, which includes the detailed dataset values and a brief introduction of various prognostics models. To empirically evaluate the effectiveness of the proposed method in addressing equipment RUL prognostics, we conducted a series of experiments on the dataset and compared it with several existing methods. First, we introduce the C-MAPSS dataset. Then we discuss the experimental results along with the analysis.

C-MAPSS Dataset
The dataset used to verify the proposed model is from NASA [44]. The NASA Commercial Modular Aero-Propulsion System Simulation(C-MAPSS) dataset of turbofan engine degradation simulations is a widely used benchmark dataset [12]. The dataset consists of multiple multivariate time series (shown in Table 1).
The dataset is supposed to be from the same type of fleet engines. Each engine starts with a different state of initial wear and manufacturing variation, which is unrevealed to the researchers. This wear and variation are considered normal, i.e., they are not considered a fault condition. Three operational settings are given in the dataset. However, we only consider the effect of sensor measurement data in this paper. The measurement data is polluted by sensor noise. The engine is normally running at the beginning of each time series, and a fault occurs at some point during the series. For the training set, the fault grows in magnitude until system malfunction occurs. For the testing set, the time series ends some time before system failure. The purpose is to prognosticate the numbers of remaining operational cycles before failure in the testing set, i.e., the number of operational cycles after the last cycle in which the engine will continue to run normally. In addition, true RUL values for the testing set data are given.
The dataset contains 4 sub-datasets with different numbers of time series, shown in Table 1. Moreover, the dataset consists of training sets and testing sets. Each row in the data is a snapshot of data taken during a single running time cycle and includes 26 columns: the 1st column indicates the engine ID, the 2nd column indicates the current running cycle number, the 3rd-5th columns indicate three operational settings, the 6th-26th column indicate 21 sensor measurement data that have a substantial effect on engine performance. A detailed description of the dataset can be found in [44].

Results
In order to empirically evaluate the availability of the proposed scheme for RUL prognostics, the proposed scheme was tested with the testing set. As shown in Figure 9, four testing trajectories selected from four sub-datasets are taken as examples. Though some obvious errors exist between the prediction values and the true RUL values, the prognostic performance is good, especially when the equipment is close to failure. Preferably, accurate RUL prognostics of the equipment in the late phase would be able to improve management availability and reduce maintenance costs because it is extremely necessary to maintain or exchange equipment in the last phase of its lifetime.
To further demonstrate the performance improvement of the proposed scheme, we compared four existing models with the same dataset: Model 1: Multilayer perceptron (MLP). MLP is a multilayer neural network to address the regression problem. In this paper, the MLP is constructed by using two hidden layers, where each layer consists of 50 neurons. The Relu function is the activation function in each hidden layer.
Model 2: Support vector regression (SVR). SVR is a machine learning method for time series prognostics. The aim of SVR is based on the computation of a linear regression function in a highdimensional feature space where the input data are mapped via a nonlinear function [55]. In this paper, the Gaussian radical basis function (RBF) is considered as the kernel function.
Model 3: Convolutional neural network. The standard convolutional neural network is introduced in Section 3.2.1. In this paper, we adopt CNN to extract spatial features and the fully connected neural network to obtain a regression model [33].
Model 4: LSTM neural network. The standard LSTM neural network is introduced in Section 3.2.2. In this paper, we adopt two LSTM layers to extract long-term temporary dependencies and the fully connected neural network to achieve RUL prognostics [8]. All of these methods were simulated and applied to the C-MAPSS dataset. Tables 2 and 3 show the results of the performance comparison using four sub-datasets. Figure 10 shows the model comparison in terms of the RMSE, MAP, MAE, and the average penalty score for the entire testing set. The average penalty score is defined as follows:s where score(i) is penalty score function, which is defined in Section 3.5, and N is the sample number of the testing set.   It is evident from Tables 2 and 3 and Figure 10 that the accuracy of the proposed scheme is higher than that of the MLP, SVM, CNN, and LSTM in adopted metrics defined in Section 3.5. Particularly, the RMSE values of MLP and SVM are 31.864 and 26.9618, respectively. The RMSE value of the proposed scheme is lower by 4.8072 and 1.5522 compared to the CNN and LSTM models, respectively. Moreover, the MAP values of the MLP and SVM are 24.1433% and 20.4548%, respectively. The MAP value of the proposed scheme is lower by 2.6775% and 0.878% compared to the CNN and LSTM models, respectively. The MAE values of MLP and SVM are 23.6265 and 17.7835, respectively. The MAE value of the proposed scheme is lower by 3.8365 and 0.8 compared to the CNN and LSTM models, respectively. In Figure 10, the average penalty scores for the MLP and SVM are much larger than those of the CNN, LSTM, and the proposed scheme as the automatic feature extraction ability of deep learning methods contributes to a higher prediction accuracy than in traditional data-driven methods. Furthermore, the proposed scheme is slightly better than the CNN and LSTM since the addition of the HI and the hybrid DNN structure extracts high-level spatial features and long-term temporal dependency features. In conclusion, the proposed scheme is superior to other models in prediction accuracy and that is encouraging for RUL prognostics tasks.

Conclusions
An accurate and reliable RUL prognostic is conducive to equipment management and operation security. On account of the fact that sensor measurement data are highly non-linear and polluted by noise, it is a complex issue to be addressed. In this paper, we proposed a hybrid scheme based on an HI and hybrid DNN model to predict equipment RUL. Differing from preceding methods, hybrid DNNs take full advantage of CNN and LSTM. Specifically, CNN is utilized to extract high-level spatial features, and LSTM is used to learn long-term temporary dependencies. Additionally, the fully connected neural network is modeled to achieve a non-linear regression. Meanwhile, HI describes the equipment's health state, and the preprocessed sensor measurement data are fed into the hybrid DNN, which can further improve the RUL prognostics performance. The proposed scheme was trained on a dataset of equipment degradation from C-MAPSS. The RMSE, MAPE, MAE, and penalty score metrics were adopted to evaluate the models. The simulation results show that the proposed scheme can satisfactorily predict RUL, especially for the late phase close to failure. In contrast to MLP, SVR, CNN, and LSTM models, compared using the same dataset, the proposed scheme possesses higher accuracy and outperforms comparative models. In the field of industrial production lines, aerospace, military, lathe maintenance, computer hardware service, etc., the proposed scheme can contribute to guaranteeing the security of equipment and reduce maintenance costs.
Author Contributions: Z.K. proposed the prognostics scheme and drafted the paper. Y.C. collected the dataset and wrote the paper. Z.X. performed the simulation and optimized the models. H.L. analyzed the results and modified the paper.

Conflicts of Interest:
The authors declare no conflict of interest.