A Double-Channel Hybrid Deep Neural Network Based on CNN and BiLSTM for Remaining Useful Life Prediction

In recent years, prognostic and health management (PHM) has played an important role in industrial engineering. Efficient remaining useful life (RUL) prediction can ensure the development of maintenance strategies and reduce industrial losses. Recently, data-driven based deep learning RUL prediction methods have attracted more attention. The convolution neural network (CNN) is a kind of deep neural network widely used in RUL prediction. It shows great potential for application in RUL prediction. A CNN is used to extract the features of time-series data according to the spatial feature method. This way of processing features without considering the time dimension will affect the prediction accuracy of the model. On the contrary, the commonly used long short-term memory (LSTM) network considers the timing of the data. However, compared with CNN, it lacks spatial data extraction capabilities. This paper proposes a double-channel hybrid prediction model based on the CNN and a bidirectional LSTM network to avoid those drawbacks. The sliding time window is used for data preprocessing, and an improved piece-wise linear function is used for model validating. The prediction model is evaluated using the C-MAPSS dataset provided by NASA. The predicted results show the proposed prediction model to have a better prediction performance compared with other state-of-the-art models.


Introduction
In recent years, mechanical equipment is becoming more and more complicated, and mechanical reliability is strictly required, so the method of prognostic and health management (PHM) has attracted more and more attention than before [1][2][3][4][5][6]. The prediction of remaining useful life (RUL) is an important part of PHM. Efficient RUL prediction can ensure the development of maintenance strategies and reduce industrial losses. The methods of RUL prediction can be roughly divided into three categories: (1) a physical model approach, (2) data-driven approach, and (3) hybrid method approach.
The physical model approach is to establish a mathematical degradation physical failure model based on the failure mechanism or performance degradation process of the system [7,8]. At present, this method is mainly used in systems with explicit models and requires researchers to have professional knowledge. Moreover, it is challenging to build dynamic mathematical and physical models accurately for multiple failure modes in the system. this paper proposes a double-channel hybrid prediction model based on CNN and a BiLSTM network. The parallel channels are independent of each other and do not affect each other in the training process, thus improving the prediction accuracy of the model. Based on the C-MAPSS dataset, the model proposed in this paper is compared with the RUL prediction model proposed in recent years, which shows that the double-channel hybrid model has better prediction capability.
The rest of the paper is organized as follows: In Section 2, it introduces the relevant literature review. In Section 3, the CNN and the LSTM network are introduced, and a double-channel hybrid model is proposed. Section 4 presents the data preprocessing method and model evaluation method. Section 5 shows the experimental results, and Section 6 gives the conclusions.

Literature Review
CNNs have been widely used in RUL prediction due to their excellent property of extracting spatial features from the raw data. Babu et al. [34] proposed a novel regression method based on a deep CNN to estimate RUL. The convolution and pooling filters are applied to multichannel sensor data along the time dimension. The features learned from the raw sensor signal are automatically combined systematically. Through the deep architecture, the model determines the high-level abstract representation of the raw sensor data. Moreover, feature learning and RUL estimation are mutually enhanced through supervised feedback. Ren et al. [35] proposed a method to predict the RUL of a bearing based on a deep CNN. For the first time, a smooth method was innovatively applied to deal with the problem of discontinuous prediction results. Experimental results showed that this method improved the accuracy of the RUL prediction of bearings. A CNN fails to consider the correlation between the time-series of the raw sensor data. For the far-away time data, it is easy to lose its characteristic information, resulting in poor long-term prediction ability and insufficient prediction accuracy. In this paper, bidirectional long short-term memory (BiLSTM) and the CNN are combined to solve this problem effectively.
LSTM networks can capture long-term dependencies between features. They can improve the accuracy of RUL prediction. Zhang et al. [36] used the LSTM neural network to predict the RUL of lithium-ion batteries. The elastic mean square backpropagation method is used to optimize the model. Wu et al. [37] proposed using a vanilla LSTM neural network to obtain better prediction accuracy under complex operation, working conditions, model degradation, and intense noise. Furthermore, a dynamic difference technique was proposed to extract interframe information to enhance the cognitive ability of the model degradation process.

Convolutional Neural Network
The CNN was firstly proposed by Lecun et al. [38] for image processing. The CNN is a class of feed-forward neural network that contains a convolutional operation and deep structure. Compared with the early backpropagation (BP) neural network, the essential characteristics of CNN are local perception and parameter sharing. It can extract the spatial features from the raw input data and realizing a high-dimensional feature representation of raw data. Figure 1 shows the convolution and pooling operation. The convolutional layer uses convolutional kernels to convolve local regions of the input data to generate corresponding features. The parameter sharing of a CNN decreases the parameters in the calculation process of the neural network. This effectively avoids the phenomenon of having too many parameters, which leads to overfitting the neural network. In this way, the overfitting of the neural network is resolved. Moreover, the system memory occupied in the training process of the neural network significantly decreases. Formula (1) shows the convolution operation of the i-th convolution kernel. where represents the input of the i-th layer CNN; represents the i-th filter kernel; and represent the bias term and the nonlinear activation function, respectively. The final output of one convolution layer is to connect the results of the convolution operation of n convolution kernels together, which can be represented as: In this paper, the maximum pooling layer is used to downsample the output of the CNN and extract critical local features. Moreover, the parameters and amount of calculation of the model decreased. Consequently, the operation speed is improved, and the overfitting problem is avoided.
The 2-dimensional convolution network was used in this paper. It is a 1-dimensional operation on the 2-dimensional input data, and it was used to extract the spatial features of the time-series from the input sample.

Bidirectional Long Short-Term Memory Neural Network
In this paper, a BiLSTM was used to extract the long-term dependency characteristics of input sample data. The BiLSTM was composed of two independent LSTM neural networks, and it has two directions of forward and backward propagation. The LSTM network proposed by Hochreiter and Schmidhuber [39] is a unique recurrent neural network (RNN) neural network. It has a particular network structure consisting of an input gate structure and an output gate structure. LSTM affects the state of an RNN at each time through the gate structure. The gates only limit directions of the information flow. The activation function of the gate's structure is sigmoid. Figure 2 shows the cell of LSTM.  The final output of one convolution layer is to connect the results of the convolution operation of n convolution kernels together, which can be represented as: In this paper, the maximum pooling layer is used to downsample the output of the CNN and extract critical local features. Moreover, the parameters and amount of calculation of the model decreased. Consequently, the operation speed is improved, and the overfitting problem is avoided.
The 2-dimensional convolution network was used in this paper. It is a 1-dimensional operation on the 2-dimensional input data, and it was used to extract the spatial features of the time-series from the input sample.

Bidirectional Long Short-Term Memory Neural Network
In this paper, a BiLSTM was used to extract the long-term dependency characteristics of input sample data. The BiLSTM was composed of two independent LSTM neural networks, and it has two directions of forward and backward propagation. The LSTM network proposed by Hochreiter and Schmidhuber [39] is a unique recurrent neural network (RNN) neural network. It has a particular network structure consisting of an input gate structure and an output gate structure. LSTM affects the state of an RNN at each time through the gate structure. The gates only limit directions of the information flow. The activation function of the gate's structure is sigmoid. Figure 2 shows the cell of LSTM. where represents the input of the i-th layer CNN; represents the i-th filter kernel; and represent the bias term and the nonlinear activation function, respectively. The final output of one convolution layer is to connect the results of the convolution operation of n convolution kernels together, which can be represented as: In this paper, the maximum pooling layer is used to downsample the output of the CNN and extract critical local features. Moreover, the parameters and amount of calculation of the model decreased. Consequently, the operation speed is improved, and the overfitting problem is avoided.
The 2-dimensional convolution network was used in this paper. It is a 1-dimensional operation on the 2-dimensional input data, and it was used to extract the spatial features of the time-series from the input sample.

Bidirectional Long Short-Term Memory Neural Network
In this paper, a BiLSTM was used to extract the long-term dependency characteristics of input sample data. The BiLSTM was composed of two independent LSTM neural networks, and it has two directions of forward and backward propagation. The LSTM network proposed by Hochreiter and Schmidhuber [39] is a unique recurrent neural network (RNN) neural network. It has a particular network structure consisting of an input gate structure and an output gate structure. LSTM affects the state of an RNN at each time through the gate structure. The gates only limit directions of the information flow. The activation function of the gate's structure is sigmoid. Figure 2 shows the cell of LSTM.   The core of the LSTM structure is the forget gate and the input gate, which can save long-term memory effectively. The forget gate makes the neural network abandon the previous useless information by the current input x t and the output h t of the last moment. The network overcomes gradient disappearance or gradient explosion during training by importing a gate structure and memory cells. The input gate determines the information to be added to the state c t−1 to generate a new state c t according to the parameters x t and h t−1 . Through the forget gate and input gate structure, the LSTM can effectively determine which information is forgotten and which information is retained. Therefore, LSTM extracts useful information; the output gate determines the output of the h t according to the output h t−1 at the moment of the latest state c t and the current input x t . The LSTM specific formula is defined as follows: BiLSTM has two directions in propagation, and both directions are simultaneously transmitted to the output unit. This can capture past and future information, as shown in Figure 3. At each time t, the input is provided to both the forward and backward LSTM networks. The result of the BiLSTM output can be represented as: where → h t and ← h t represent the forward and backward results of BiLSTM, respectively.
Sensors 2020, 20, x FOR PEER REVIEW 5 of 16 The core of the LSTM structure is the forget gate and the input gate, which can save long-term memory effectively. The forget gate makes the neural network abandon the previous useless information by the current input and the output ℎ of the last moment. The network overcomes gradient disappearance or gradient explosion during training by importing a gate structure and memory cells. The input gate determines the information to be added to the state to generate a new state according to the parameters and ℎ . Through the forget gate and input gate structure, the LSTM can effectively determine which information is forgotten and which information is retained. Therefore, LSTM extracts useful information; the output gate determines the output of the ℎ according to the output ℎ at the moment of the latest state and the current input . The LSTM specific formula is defined as follows: BiLSTM has two directions in propagation, and both directions are simultaneously transmitted to the output unit. This can capture past and future information, as shown in Figure 3. At each time , the input is provided to both the forward and backward LSTM networks. The result of the BiLSTM output can be represented as: where ℎ ⃗ and ℎ ⃖ represent the forward and backward results of BiLSTM, respectively.
Output Forward Backward Input Figure 3. Bidirectional long short-term memory (BiLSTM) neural network.

The CNN-BiLSTM Neural Network Structure
In order to fully extract the spatial and temporal features of the input data for RUL prediction, this paper proposes a double-channel hybrid neural network model based on a CNN and a BiLSTM. The structure of the model is shown in Figure 4. The CNN and BiLSTM respectively extract features from the raw data, then concatenate them together and map them to a fully connected layer. The combination of CNN and BiLSTM increases the amount of data, which can then improve the accuracy and stability of prediction.

The CNN-BiLSTM Neural Network Structure
In order to fully extract the spatial and temporal features of the input data for RUL prediction, this paper proposes a double-channel hybrid neural network model based on a CNN and a BiLSTM. The structure of the model is shown in Figure 4. The CNN and BiLSTM respectively extract features from the raw data, then concatenate them together and map them to a fully connected layer. The combination of CNN and BiLSTM increases the amount of data, which can then improve the accuracy and stability of prediction.
The input data samples adopt a 2-dimensional structure of W l × n. Here, W l is the length of the time-series, and n is the number of features. In the process of data processing, a sliding time window was used to extract data samples, and the sliding step size of the window is 1. For different datasets of the C-MAPSS, different time-window lengths were used.  The input data samples adopt a 2-dimensional structure of × . Here, is the length of the time-series, and is the number of features. In the process of data processing, a sliding time window was used to extract data samples, and the sliding step size of the window is 1. For different datasets of the C-MAPSS, different time-window lengths were used.
The first channel of the hybrid model is composed of three layers of CNN, a layer of maximum pooling, and a layer of flatten. The CNN structure of the first channel was used to extract the spatial features from the sensor's raw data; however, the depth of time features extraction of the raw highdimensional data is far from enough. Therefore, the BiLSTM structure was adopted to complete the extraction of the data time-series features and extract the long-term dependencies between data features. Meanwhile, the BiLSTM structure can avoid gradient disappearance and gradient explosion during training the model. The first convolutional layer used 10 filters of size (1 × 10), and the maximum pooling filter of size (1 × 2) is adopted. The second layer of convolution used 10 filters with filter size of (1 × 3). The last layer of the convolutional used a filter of size (1 × 3). If the data dimension is too small, the first channel CNN will fail. In order to prevent failure, the stacked CNN adopted the zero-filling method. The convolutional data were finally input to the flatten layer. The purpose of the flatten layer is to convert the features extracted from the first channel convolution neural network into a 1-dimensional structure, which is convenient for concatenating with the features extracted from the BiLSTM path. The second channel stacked two layers of BiLSTM. Moreover, the cells of the two layers were set to 16 and 8, respectively. The input data were converted to 1-dimensional data before they were inputted into the BiLSTM path.
The feature data matrices extracted from the first channel and the second channel were concatenated, inputting to a fully connected layer containing 100 neurons. To prevent overfitting, the dropout technique was used. The dropout technology stops the hidden layer neurons with selfdefined probability numbers from working in the forward propagation of the training process. In doing so, it improves the generalization ability of the model and enhances the robustness of the model. In this paper, the dropout rate is 0.5. Hence, a part of neurons in the network will be hidden with a self-defined probability of 0.5 in each iteration during the training process. A subnetwork will be obtained by randomly sampling the network parameters, and only the subnetwork will be updated in this iteration. All the above layers use "ReLU" as the activation function. Finally, a regression layer containing a neuron was used to predict the target RUL.
The BP algorithm was used to train the model, and the model weights and biases were updated continuously. The Adams optimizer algorithm was used to monitor the training error "MSE" in order to update the model parameters continually. The learning rate was set to 0.001. The mini-batch size was 512, and the maximum training epoch was set as 200. The first channel of the hybrid model is composed of three layers of CNN, a layer of maximum pooling, and a layer of flatten. The CNN structure of the first channel was used to extract the spatial features from the sensor's raw data; however, the depth of time features extraction of the raw high-dimensional data is far from enough. Therefore, the BiLSTM structure was adopted to complete the extraction of the data time-series features and extract the long-term dependencies between data features. Meanwhile, the BiLSTM structure can avoid gradient disappearance and gradient explosion during training the model. The first convolutional layer used 10 filters of size (1 × 10), and the maximum pooling filter of size (1 × 2) is adopted. The second layer of convolution used 10 filters with filter size of (1 × 3). The last layer of the convolutional used a filter of size (1 × 3). If the data dimension is too small, the first channel CNN will fail. In order to prevent failure, the stacked CNN adopted the zero-filling method. The convolutional data were finally input to the flatten layer. The purpose of the flatten layer is to convert the features extracted from the first channel convolution neural network into a 1-dimensional structure, which is convenient for concatenating with the features extracted from the BiLSTM path. The second channel stacked two layers of BiLSTM. Moreover, the cells of the two layers were set to 16 and 8, respectively. The input data were converted to 1-dimensional data before they were inputted into the BiLSTM path.
The feature data matrices extracted from the first channel and the second channel were concatenated, inputting to a fully connected layer containing 100 neurons. To prevent overfitting, the dropout technique was used. The dropout technology stops the hidden layer neurons with self-defined probability numbers from working in the forward propagation of the training process. In doing so, it improves the generalization ability of the model and enhances the robustness of the model. In this paper, the dropout rate is 0.5. Hence, a part of neurons in the network will be hidden with a self-defined probability of 0.5 in each iteration during the training process. A subnetwork will be obtained by randomly sampling the network parameters, and only the subnetwork will be updated in this iteration. All the above layers use "ReLU" as the activation function. Finally, a regression layer containing a neuron was used to predict the target RUL.
The BP algorithm was used to train the model, and the model weights and biases were updated continuously. The Adams optimizer algorithm was used to monitor the training error "MSE" in order to update the model parameters continually. The learning rate was set to 0.001. The mini-batch size was 512, and the maximum training epoch was set as 200.

C-MAPSS Dataset
In this part, the C-MAPSS dataset was used to verify the predictive performance of the proposed double-channel hybrid deep neural network model. The C-MAPSS dataset has four subsets (FD001, FD002, FD003, and FD004), recording 21 different types of sensor data respectively. The monitoring Sensors 2020, 20, 7109 7 of 15 data of all sensors reflect the well-being status of the engine. These four subsets contain different operating conditions and fault modes. Each subset possesses its training dataset and test dataset. The train dataset records the data of all sensors in the whole life cycle from the start of the run to the time of the failure. The test dataset records all sensor data from the start of engine operation to a random cycle point in time. In datasets, the first column shows the serial number of the engine, with the second column representing the number of cycles of the engine, the third to fifth column showing different operating settings, and the sixth to the twenty-sixth column displaying the data recorded by 21 various sensors. The details of the four datasets are shown in Table 1.

Data Preprocessing
The 21 sensors in each subset record different data, and the data range between different sensors varies significantly. The unprocessed data affected the performance of predicting the engine RUL, so it is essential to standardize the data. Data normalization is an essential step in data processing. Additionally, the output value of Sensors 1, 5, 6, 10, 16, 18, and 19 are constant, without providing useful information for RUL prediction. Therefore, the other 14 sensors' data were selected for normalization processing for RUL prediction.
There are two ways to normalized the data, the Z-score normalization as given in Formula (10), and the min-max standardization as given in Formula (11). Min-max normalization is employed to process the data in this paper.
where x i j is the j-th output of the i-th sensor; x i is the mean value of all outputs of the i-th sensor; x i min is the minimum value of all output of the i-th sensor; and x i max is the maximum value of all output of the i-th sensor.
In the model of RUL prediction, the cycle number of multivariate time-series included in the input sample data has a direct impact on the prediction performance. The larger the cycle number of samples, the more data information is monitored by sensors, and the better the model's prediction effect. In this paper, a sliding time window was used to extract samples of standardized data. According to the minimum cycle number of all engines in the four datasets, the time-window lengths of the four datasets are 30, 21, 36, and 18 cycles, respectively, and the sliding step size is one cycle. Figure 5 shows that using a sliding time window to extract input data samples from a multivariate time-series normalized the data. The RUL of the last cycle of the time window was used as the label for the sample. Meanwhile, n is the number of engine cycles represented, l is the length of the time window, then n − l + 1 samples can be extracted from the multivariate time-series dataset of the engine.

Evaluation Methods
In this paper, a scoring function and the root mean square error (RMSE) are used to evaluate the performance of the proposed model of RUL estimation. The detailed introduction is shown as follows.
The scoring function is as follows: where N is the number of test samples; h i means the error between the RUL prediction and RUL actual ; and h i = RUL prediction − RUL actual . The scoring function was proposed at the International Conference on Prognostics and Health Management Data Challenge (2008). The root mean square error (RMSE) is as follows: Figure 6 shows two indicators of the evaluation model, i.e., the RMSE and scoring function, giving different penalties for early and late prediction. Value of h (h = 0) means that the RUL prediction is equal to RUL actual , and the two indicators receive a zero. When the error increases, the values of the RMSE and the score also increase correspondingly. RMSE can directly reflect the overall RUL prediction error and is also one of the commonly used indexes. For the scoring function, if the predicted value is less than the actual value, it is predicted early, and the penalty coefficient is small. In contrast, late prediction induced a large penalty coefficient. The advance prediction of the RUL can make maintenance plans and reduce losses in industries. RMSE has the same penalty for early and late prediction.

RUL Target Function
The RUL in the C-MAPSS dataset was considered to decrease linearly with time when predicting the RUL. In practice, the performance of the machine is stable at the beginning of work, and the degradation of the machine can be ignored. However, the RUL of mechanical parts decreases linearly with time due to various working conditions such as wear and vibration. A piece-wise linear RUL method is presented in [29]. Therefore, this study also used a piece-wise linear function to represent the RUL, as shown in Figure 7, and the maximum RUL was set to 125.

RUL Target Function
The RUL in the C-MAPSS dataset was considered to decrease linearly with time when predicting the RUL. In practice, the performance of the machine is stable at the beginning of work, and the degradation of the machine can be ignored. However, the RUL of mechanical parts decreases linearly with time due to various working conditions such as wear and vibration. A piece-wise linear RUL method is presented in [29]. Therefore, this study also used a piece-wise linear function to represent the RUL, as shown in Figure 7, and the maximum RUL was set to 125.

Results and Analysis
In this section, the double-channel hybrid prediction model based on the CNN and BiLSTM is evaluated by the C-MAPSS datasets. For the four datasets of the C-MAPSS, a different sliding timewindow length is used to process the data samples, and then we input the model for training. The model is evaluated by the RMSE and scoring function. The last time-window data of all engines in the four test datasets are selected to predict the RUL. The prediction results are shown in Figure 8. The solid black line indicates the actual RUL of the engine, and the solid red line represents the predicted RUL. The predicted values of the proposed model are very close to the actual values. Figure  9 shows the error frequency distribution between the predicted and actual RUL of the engine for the C-MAPSS datasets.

RUL Target Function
The RUL in the C-MAPSS dataset was considered to decrease linearly with time when predicting the RUL. In practice, the performance of the machine is stable at the beginning of work, and the degradation of the machine can be ignored. However, the RUL of mechanical parts decreases linearly with time due to various working conditions such as wear and vibration. A piece-wise linear RUL method is presented in [29]. Therefore, this study also used a piece-wise linear function to represent the RUL, as shown in Figure 7, and the maximum RUL was set to 125.

Results and Analysis
In this section, the double-channel hybrid prediction model based on the CNN and BiLSTM is evaluated by the C-MAPSS datasets. For the four datasets of the C-MAPSS, a different sliding timewindow length is used to process the data samples, and then we input the model for training. The model is evaluated by the RMSE and scoring function. The last time-window data of all engines in the four test datasets are selected to predict the RUL. The prediction results are shown in Figure 8. The solid black line indicates the actual RUL of the engine, and the solid red line represents the predicted RUL. The predicted values of the proposed model are very close to the actual values. Figure  9 shows the error frequency distribution between the predicted and actual RUL of the engine for the C-MAPSS datasets.

Results and Analysis
In this section, the double-channel hybrid prediction model based on the CNN and BiLSTM is evaluated by the C-MAPSS datasets. For the four datasets of the C-MAPSS, a different sliding time-window length is used to process the data samples, and then we input the model for training. The model is evaluated by the RMSE and scoring function. The last time-window data of all engines in the four test datasets are selected to predict the RUL. The prediction results are shown in Figure 8. The solid black line indicates the actual RUL of the engine, and the solid red line represents the predicted RUL. The predicted values of the proposed model are very close to the actual values. Figure 9 shows the error frequency distribution between the predicted and actual RUL of the engine for the C-MAPSS datasets.
In previous work [32,34,40,41], different deep learning models were established, and predictable the performance through the C-MAPSS dataset was analyzed. Tables 2 and 3 show the RMSE and score comparison between the proposed hybrid network model and other reported models. The result indicated that the proposed model shows a smaller RMSE and score in the four datasets. The improved ratio is the degree to which the RMSE/score is improved compared with the other papers' models. Compared with other models, the apparent improvement of the score index shows that the hybrid network model can better realize RUL prediction ahead of time, thus realizing predictive maintenance of industrial production. The significant improvement of the RMSE index indicates that the model has a more accurate prediction ability. Sensors 2020, 20, x FOR PEER REVIEW 10 of 16    The prediction capability of the model proposed in this paper is significantly improved. Compared with the traditional MLP, SVR, and RVR models, the proposed model does not need artificial feature extraction. It can realize independent feature extraction in the time dimension and space dimension, with better applicability, smaller error, and better prediction performance. Compared with CNN, LSTM, and BiLSTM networks, the double-channel neural network extracts features independently of each other and then combines the features extracted by the two channels, making full use of the advantages of the CNN and BiLSTM networks, increasing the amount of data and improving the prediction accuracy.
The prediction results of the FD001 and FD003 datasets with smaller errors are dramatically better than the FD002 and FD004 datasets. Because the operating conditions of the FD002 and FD004 datasets are more complicated than other datasets, the operation mode and failure mode affected the working process of the system and increased the difficulty of prediction. The FD002 and FD004 datasets contain six different operating conditions. In contrast, FD001 and FD003 have only one operating condition. The scores predicted by the FD002 and FD004 datasets are large, caused by a large number of engines included in these two datasets. Figure 10 shows the comparison of the predicted and actual RUL values for the four engines selected from the test datasets of FD001, FD002, FD003, and FD004 with the double-channel hybrid CNN-BiLSTM model, single-channel CNN model, and single-channel BiLSTM model. The 135th engine data in the FD004 data set has 435 cycles. In order to facilitate observation, a set of data is drawn from every four points in this paper. It can be seen from the figure that compared with the CNN and BiLSTM model, the prediction effect of most engines using the double-channel CNN-BiLSTM model in this paper is better than other methods. Moreover, the predicted RUL fluctuates around a constant value in the early periods. With the increase in the number of engine cycles, the actual and predicted RUL values decrease linearly in line with the reality of the engine degradation phenomenon. However, the expected value is close to the real value in the last few cycles. With the increase of engine cycles, the model's ability to capture engine degradation is significantly improved. In the latter stage, the model's predictive ability becomes even better. The model proposed in this paper shows good predictive ability. selected from the test datasets of FD001, FD002, FD003, and FD004 with the double-channel hybrid CNN-BiLSTM model, single-channel CNN model, and single-channel BiLSTM model. The 135th engine data in the FD004 data set has 435 cycles. In order to facilitate observation, a set of data is drawn from every four points in this paper. It can be seen from the figure that compared with the CNN and BiLSTM model, the prediction effect of most engines using the double-channel CNN-BiLSTM model in this paper is better than other methods. Moreover, the predicted RUL fluctuates around a constant value in the early periods. With the increase in the number of engine cycles, the actual and predicted RUL values decrease linearly in line with the reality of the engine degradation phenomenon. However, the expected value is close to the real value in the last few cycles. With the increase of engine cycles, the model's ability to capture engine degradation is significantly improved. In the latter stage, the model's predictive ability becomes even better. The model proposed in this paper shows good predictive ability.  Figure 11 shows the effect of the sliding time-window length on the model's predictive performance when extracting samples in the FD001 dataset. In this paper, the length of 7 timewindows with different sizes was selected from the FD001 dataset, i.e., 2, 5, 10, 15, 20, 25, and 30. Other parameters of the model were set in the same way, and the box plot drawn by the RMSE and score was obtained after ten runs. Figure 10 shows the better performance of the model after training  Figure 11 shows the effect of the sliding time-window length on the model's predictive performance when extracting samples in the FD001 dataset. In this paper, the length of 7 time-windows with different sizes was selected from the FD001 dataset, i.e., 2, 5, 10, 15, 20, 25, and 30. Other parameters of the model were set in the same way, and the box plot drawn by the RMSE and score was obtained after ten runs. Figure 10 shows the better performance of the model after training the data samples extracted with a large time-window length. In a certain range, the larger the window length, the more information contained in the extracted samples, which is more conducive to the extraction of data features by the model. The minimum cycle number of the engine in the dataset in the FD001 dataset is 31 cycles, and the time-window length is 30 cycles.
Sensors 2020, 20, x FOR PEER REVIEW 13 of 16 the data samples extracted with a large time-window length. In a certain range, the larger the window length, the more information contained in the extracted samples, which is more conducive to the extraction of data features by the model. The minimum cycle number of the engine in the dataset in the FD001 dataset is 31 cycles, and the time-window length is 30 cycles.  Table 4 shows the performance comparison between the only CNN channel model, the only BiLSTM channel model, and the hybrid model on the FD001 dataset. The only CNN channel model is mainly composed of three convolution layers, a maxpooling layer, a fully-connection layer, and a regression layer. The only BiLSTM channel model is mainly composed of two BiLSTM network layers, a fully-connection layer and a regression layer. All parameter settings are consistent with those of the hybrid model proposed in this paper. It was observed that the performance of the proposed hybrid model is significantly better than that of the other two models. The hybrid model combines the advantages of the CNN and BiLSTM to extract the temporal and spatial features of the raw sensor data; the fully-connection layer increases the output of the nonlinearity. Finally, the regression layer was used to implement the RUL prediction.

Conclusions
This paper proposed a double-channel hybrid prediction model based on a CNN and BiLSTM network. The sliding time-window was used for data preprocessing, and an improved piece-wise linear function was used for model validating. The prediction model was evaluated using the C-MAPSS dataset provided by NASA. This paper can be concluded as follows: 1. Compared with other state-of-the-art models, the hybrid model gets a better RMSE and score.
The results show that the proposed double-channel hybrid model has a better predictive capability. The data of four engines in the dataset were extracted for prediction. The model can predict the degradation trend of engine performance. 2. Within a specific time-window length range, increasing the length of the time window will improve the RUL prediction accuracy of the double-channel hybrid model.  Table 4 shows the performance comparison between the only CNN channel model, the only BiLSTM channel model, and the hybrid model on the FD001 dataset. The only CNN channel model is mainly composed of three convolution layers, a maxpooling layer, a fully-connection layer, and a regression layer. The only BiLSTM channel model is mainly composed of two BiLSTM network layers, a fully-connection layer and a regression layer. All parameter settings are consistent with those of the hybrid model proposed in this paper. It was observed that the performance of the proposed hybrid model is significantly better than that of the other two models. The hybrid model combines the advantages of the CNN and BiLSTM to extract the temporal and spatial features of the raw sensor data; the fully-connection layer increases the output of the nonlinearity. Finally, the regression layer was used to implement the RUL prediction.

Conclusions
This paper proposed a double-channel hybrid prediction model based on a CNN and BiLSTM network. The sliding time-window was used for data preprocessing, and an improved piece-wise linear function was used for model validating. The prediction model was evaluated using the C-MAPSS dataset provided by NASA. This paper can be concluded as follows: