1. Introduction
In recent years, with technological progress, the complexity of turbofan engines, which are core components of an aircraft, has increased. Nearly 60% of total breakdowns pertain to the turbofan engines of an aircraft [
1]. The varying operating hours and conditions affect the performance of turbofan engines. Therefore, it is necessary to determine and implement strategies to ensure safety and reliability. Maintenance strategies like corrective or preventive maintenance have been insufficient to meet the reliability and safety issues requirements. The prognostics and health management (PHM) concept, which is also named condition-based maintenance (CBM) and predictive maintenance (PdM), has gained more attention thanks to its ability to overcome emerging problems [
2]. PHM ensures systems work optimally as designed, prevents failure, reduces maintenance costs, and monitors and manages the health of the systems and equipment [
3]. As stated in the 2004 International Organization for Standardization [
4], prognostics mainly focuses on providing the health state of the system and predicting failure modes and the RUL of components within that system [
5]. The RUL is the time between the current time and when a component or a system no longer maintains its healthy state and useful life [
6]. RUL prediction allows for reducing costs, scheduling appropriate maintenance and replacement plans, and preventing occupational accidents.
PHM plays a critical role in industries like aviation, where reliability is paramount due to the potentially devastating impact of accidents in terms of human life and economic losses. In this study, we focused on turbofan engines because a turbofan engine is the most critical component of an aircraft. It is one of the most complex components that require high reliability and quality [
7]. RUL prediction is essential for developing maintenance strategies to reduce the cost and reliability aspects of turbofan engines [
8].
Existing RUL prediction methods can generally be categorized into model-based, data-driven, and hybrid approaches [
9,
10]. Model-based approaches are helpful methods that accurately describe system degradation using mathematical models [
11]. However, building mathematical models in complex systems can be difficult and costly, requiring expertise and domain knowledge about physical systems. Data-driven approaches can reveal the characteristics of system degradation by converting historical sensor data into useful information. With advanced sensor and computer technologies, data-driven approaches have been widely used to learn about complex system degradation with less expertise in industry and academia. Hybrid approaches are methods in which the two aforementioned approaches are used together to take advantage of model-based and data-driven approaches. However, hybrid approaches also have limitations due to both methods, making the use of hybrid approaches challenging. In this study, data-driven approaches were considered in the RUL prediction of turbofan engines.
Data-driven approaches are based on statistical techniques and artificial intelligence [
12]. Statistical methods use models with various assumptions to probabilistically predict the RUL [
6]. Statistical techniques, such as the Wiener process [
13] and gamma process [
14] are used for degradation modeling, and Markovian-based models [
15] are used as an RUL prediction model for turbofan engines.
Artificial intelligence utilizes machine learning methods, mainly deep learning techniques [
16]. For supervised learning, signal data needs target values that indicate the degradation process. The PwL function is mainly used to construct a degradation model [
17]. This model assumes that the RUL is constant until a certain point of failure. After this point of failure, it decreases linearly. For generalization and reducing the complexity of RUL prediction problems, all engines use the same constant RUL value during the normal/healthy operation of the engines [
7,
9,
18,
19].
Deep-learning-based RUL prediction methods have produced significant results for turbofan engines. Several studies on turbofan engines used deep learning techniques, such as recurrent neural networks (RNNs) to analyze time series data. An RNN, which is a deep learning technique for time series data, can successfully obtain short-term dependencies but has problems learning long-term dependencies. A long short-term memory (LSTM) network controls the flow of time-dependent information through gates to overcome the long-term dependency problem. Zheng et al. [
11] proposed an LSTM model to make RUL predictions using the time-sequence data from the sensors. They achieved better RUL prediction accuracy compared with the hidden Markov and traditional RNN models. Wang et al. [
20] used sensor data sequence with the bidirectional long short-term memory (BiLSTM) network for RUL prediction and stated that the BiLSTM network outperformed the traditional machine learning approaches. Wu et al. [
21] used sensor smoothing and selection and proposed a deep LSTM network for RUL prediction. They tested the proposed approach using the C-MAPSS sub-datasets FD001 and FD003 [
22]. To solve the high-dimensionality problem of C-MAPSS data, an autoencoder (AE) was used to compress the sensor measurements, and a BiLSTM model was proposed with the aim of better prediction [
6]. Dual-channel LSTM [
19] was used with the first-order difference of sensor measurements and extracted more information about the degradation process of the engines. By classifying the degradation phases, multi-scale LSTM was applied to predict the RUL after the constant value degradation phase [
23]. They achieved a 40% reduction in the root-mean-square error (RMSE) for the FD001 dataset.
Another deep learning technique, namely, CNN, has been broadly used in turbofan RUL prediction due to its superiority in feature extraction. A deep CNN approach with a time window [
9] was applied to normalized raw C-MAPSS data. Muneer et al. [
24] combined deep CNN with an attention mechanism to extract highly abstract degradation and trend features. Temporal convolution expanded the receptive field for long sequences to improve the prediction performance [
25]. Tan et al. [
26] also used temporal convolution with an attention mechanism for selecting relevant information from a series of sensor measurements. The multi-scale deep CNN method with different-sized filters was proposed to achieve complex features for the degradation process and had a high prognostic performance [
27]. Also, the spatio-temporal attention mechanism with position encoding was applied to capture the temporal relation between features [
28].
Various types of CNNs and RNNs have been used in the literature in a hybrid fashion to improve generalization and consider temporal and spatial features of sensor measurements. A CNN and LSTM were combined within a deep network in a parallel manner for complicated prognostic scenarios [
29]. The health index (HI), which was calculated from raw sensor measurements, was used as a new time series and fed to the network, which consisted of serial CNN-LSTM layers [
30]. Furthermore, a double-channel hybrid deep neural network containing CNN and BiLSTM layers was applied for better prediction performance [
18]. A lower-dimensional projection of the sensor measurements obtained from a CNN-based autoencoder method was used as input for CNN-LSTM serial layers in [
31]. The features from a one-dimensional fully convolutional neural network and LSTM network were fused and then fed into the following CNN network to improve the effectiveness of the prediction model [
1].
A group of studies used an attention mechanism, which converts the original input into weighted input to represent the features better. For instance, an attention mechanism was used in the bidirectional gated recurrent units (BiGRUs)-CNN hybrid neural network in [
32]. Gong et al. [
33] used an attention-based temporal convolutional network (TCN) and the BiGRUs for RUL prediction. A random forest (RF) algorithm was used to rate the importance of the features and implement variational mode decomposition on these features.
Another group of studies focused on constructing new features to contribute to the learning of degradation processes. By taking the first-order differences for the time series of sensor measurements, new features were created and added to the dataset [
19,
34]. Correlation-based degradation features were used in [
35], while mean and trend coefficients were used in [
36]. A complete ensemble empirical mode decomposition method was used to create sensor measurement trend features [
37]. Bae and Xi [
38] transformed the time-based cycle feature into a physical health time step to provide meaningful features.
The studies presented above used a PwL function for target RUL labeling. The constant and maximum part of the piecewise RUL, mostly named RUL
max, is common for all the engines in the dataset. This approach is helpful for generalization but does not accurately reflect the situation of turbofan engines in the real world. The degradation process of each engine differs from the other. The studies on engine-specific PwL target labeling are less frequent than studies using a common RUL
max for all engines. The main reason is the lack of sufficient data to differentiate engines from each other while predicting the RUL in the healthy state of the engine. Also, incomplete sensor data in the test dataset hinders precise change point determination. Most studies using the engine-specific RUL
max focused only on the training dataset. In general, health state (HS)-division-based specific RUL labeling was studied in a dual-task manner for both RUL prediction and HS evaluation simultaneously [
39,
40]. A dual-LSTM framework was used for change point detection in RUL labeling and HI-based RUL prediction [
41]. The RUL prediction in the healthy state is difficult due to the lack of information about the degradation process, and thus, the RUL prediction was made using the sensor measurements after the change point [
42,
43].
The challenge of predicting the RUL is not confined solely to turbofan engines but spans across diverse domains, encompassing applications in areas such as Li-ion batteries and bearings, where RUL prediction plays a prominent role. In [
44], an LSTM and gradient boosting machine (GBM) were utilized to analyze Li-ion batteries, combined with explainable artificial intelligence techniques for feature selection. The study in [
45] introduces a novel approach for improving the accuracy of bearings’ RUL prediction, combining a multi-branch convolutional network (MBCNN) with global attention and a BiLSTM network, utilizing both spatial and timing features from vibration signals, which were ultimately tested on a public bearing degradation dataset. Also, the significance of RUL labeling on RUL prediction is demonstrated in the study by [
46] with load calculations, which assessed bearings. 
When considering the studies conducted not only within the realm of RUL prediction for turbofan engines but also in various other applications, it becomes evident that there is an apparent necessity for comprehensive and pioneering research in the domains of target labeling approaches, feature engineering processes, and the development of effective network models.
This study primarily investigated data-driven approaches, specifically deep learning techniques, for turbofan engine RUL prediction. Various deep learning methods, including long short-term memory (LSTM) and convolutional neural networks (CNNs), were examined, and this paper proposes a novel approach that combines change-point-detection-based target labeling and feature construction for RUL prediction. This research aimed to improve the accuracy of RUL prediction in different stages of the engine’s life and used engine-specific RUL labels for a more realistic representation of engine behavior. This paper presents a comprehensive experimental study using turbofan data and the results were compared with existing methods to demonstrate the proposed method’s effectiveness. 
The main contributions of this paper are as follows:
- To the best of our knowledge, this study was the first to combine change-point-detection-based target labeling and feature construction in RUL prediction. 
- In line with the current trend in RUL prediction, we embraced a flexible target-labeling approach, employed innovative feature engineering strategies, and introduced an efficient hybrid network to enhance prediction accuracy while minimizing computational complexity, as validated through comparisons. 
- Most studies used PwL target labels for RUL prediction with a constant RULmax value like 120, 125, and 130. However, this study used an engine-specific PwL target label for each engine. 
- Previous studies used the original training dataset for both training and testing by splitting it into two disjoint sets. Different from them, this study focused on predicting the RUL using the original test dataset to ensure a fair evaluation. 
- We fitted continuous PwL functions on the one-dimensional data that was fused using an autoencoder-based feature extraction method for engine-specific target RUL labeling. We use the Python library pwlf, in which the unknown breakpoints are determined with the differential evolution-based optimization algorithm. An increasing number of line segments were used until there was no significant improvement in finding the first change point. The earliest first change point was selected for piecewise target RUL labeling. 
- A handcrafted feature was constructed, which includes the difference between the sensor measurements in every cycle and the first sensor measurement, and an added feature to the dataset. Thus, RUL prediction was improved in the middle- and early-life stages. 
The remainder of this paper starts with the theoretical background of the deep learning methods and feature engineering approaches in 
Section 2, along with the proposed architecture of the neural networks used in this study. The experimental study was performed using the FD001 sub-dataset of the C-MAPSS dataset. The results are compared with the studies using similar labeling methods in 
Section 3. Finally, 
Section 4 concludes the paper and proposes future work.
  2. Methodology
This section introduces the theoretical background of the proposed study for RUL prediction. First, the deep learning techniques CNN and LSTM are defined. Next, the conceptual framework of the autoencoders employed in the feature engineering process is elucidated. Next, the feature engineering approaches are explained. Finally, the proposed deep network structure is given.
  2.1. Convolutional Neural Networks
A CNN is a deep learning neural network commonly used for image classification, object detection, and other computer vision tasks [
47]. The main component of a CNN is the convolution layer, which applies multiple filters/kernels to the input image to produce a set of feature maps. The convolution operation slides the filter over the input data and computes the dot product between the filter weights and the input data at each step. This operation produces a feature map. The size of the output feature map depends on the size of the input data, the size of the filter, and the stride and padding used during the convolution. After the convolution operation, it applies an activation function like rectified linear unit (ReLU) or sigmoid. This step introduces nonlinearity to the model. The feature maps obtained are then fed into other layers within the CNN, such as pooling and fully connected layers. These layers utilize the extracted features to perform classification or regression. The following equation is used to perform a convolution operation:
        where 
 is the input tensor, 
 is the 
th convolution kernel, 
 is the bias vector, 
 is the 
th obtained feature map, and ∗ denotes the convolution operation. These feature maps are passed through an activation function 
, as expressed
        
A subsequent pooling operation downsamples the feature maps produced by the convolutional layers, reducing the spatial size of the data and the number of parameters in the model. Max pooling is one of the most commonly used pooling types in CNNs and is functionally denoted as
        
        yielding 
 as the 
th feature map in the pooling layer 
, where its parameter 
 is the 
th feature map in layer 
, 
 is the pooling size, and 
 is the stride. The fully connected layers connect every neuron in one layer to every neuron in the next layer, and they are typically used toward the end of the CNN to perform classification or regression on the extracted features.
One-dimensional CNNs, which are variations of CNNs, are designed to process sequential data, such as time series [
48]. The number of dimensions of the CNN refers to the number of dimensions over which the filter/kernel slides. The input and output of a 1D-CNN are two-dimensional. The first dimension is timesteps and the other is features. The basic architecture of a 1D-CNN is similar to a regular CNN, but the filters are applied only along the timesteps dimension as stated in [
49]. The 1D convolution operation for multivariate time series data is depicted in 
Figure 1, where the height represents the number of time steps, its width is one, and the number of input features is equivalent to the depth or the number of channels. The lines represent filtering operations with 1D kernels.
One of the main advantages of using 1D-CNNs is that they can capture local patterns and features within the sequential data, which can be helpful for tasks such as detecting time-dependent anomalies. Additionally, 1D-CNNs require less computational resources than other types of deep learning models, which can be helpful in applications with limited resources [
48,
50]. 
Overall, CNNs and 1D-CNNs are powerful deep learning models that have been shown to achieve successful results on a wide range of computer vision and sequential data tasks.
  2.2. Long Short-Term Memory Network
An LSTM network is a type of RNN architecture introduced by [
51]. LSTM was designed to handle the vanishing gradients issue for long-time sequences in traditional RNNs. These vanishing gradients make it difficult for the RNN to learn long-term dependencies effectively.
LSTMs can keep information over a longer period by incorporating a memory cell. An LSTM controls the short-term and long-term cell states via three gates. These are the forget, input, and output gates, operating in the specified order. The basic structure of an LSTM cell is shown in 
Figure 2. The input of the current cell is 
 and the short-term and long-term states of the previous cell are 
 and 
, respectively. During the operation of the gates, three signals are generated from the gates and denoted as 
, 
, 
, and 
, respectively.
The forget gate controls how much of the previous long-term cell state 
’s information is retained or forgotten. The forget gate signal is generated through a sigmoid function 
 using the short-term state of the previous cell, namely, 
, and input of the current cell, namely, 
:
        where 
 and 
 are the weights and 
 is the bias.
The input gate controls the influence of the new information on the current cell with Equation (5):
        where 
 and 
 are the weights and 
 is the bias. The current cell state candidate values 
 are created, and the previous long-term cell state is updated into a new state using the following equations:
        
        where 
 and 
 are the weights, 
 is the bias, 
 is element-wise multiplication, 
 is the tangent hyperbolic activation function, and 
 is the updated long-term cell state. The output of the LSTM cell is obtained with the following two equations:
        where 
 and 
 are the weights and 
 is the bias.
The updated long-term cell state  and the short-term cell state  are then passed to the next time step as input. This process of updating the cell and hidden state is repeated at each time step of the input sequence.
  2.3. Autoencoders
Autoencoders are a type of neural network architecture, where the output is the same as the input, which can be used for unsupervised learning tasks, such as data compression, feature extraction, and dimensionality reduction [
7,
52]. The main components of an auto-encoder include an encoder network, a decoder network, and a bottleneck layer, as shown in 
Figure 3.
The encoder takes the condition monitoring input data 
 and maps it to a lower-dimensional latent space representation 
. The output of the encoder is
        
Encoding is typically done through a series of hidden layers that gradually reduce the number of nodes in the network. The bottleneck is the layer in the middle of the network where the input data is compressed into the lower-dimensional latent space representation. The size of the bottleneck layer determines the dimensionality of the latent space representation. The decoder takes the latent space representation 
 and maps it back to the reconstructed input data 
:
Like the encoder, the decoder typically consists of a series of hidden layers that gradually increase the number of nodes in the network. The autoencoder is trained to minimize the difference between the original input data and the reconstructed input data by the decoder. A loss function, such as the mean squared error (MSE) or binary cross-entropy, is minimized.
Autoencoders can capture complex nonlinear relationships between variables, making them suitable for representing high-dimensional data with nonlinear structures [
7].
  2.4. Feature Engineering
Feature engineering comprises selecting, processing, and transforming raw data into features that can be used as inputs to a machine learning algorithm [
53]. It is a crucial step in the machine learning pipeline, as the quality of features can significantly impact the accuracy and performance of the model.
Feature selection, scaling, extraction, and encoding are the standard techniques used in feature engineering [
53]. Feature selection involves selecting a subset of the most relevant features from the available data. Feature scaling helps to normalize or standardize the values of features to ensure they are on a similar scale. One of the scaling operations is min–max normalization:
        where 
 represents the normalized value of the 
th data point for the 
th feature. 
 represents the raw value of the data before normalization, and 
 and 
 are the maximum and minimum values of the 
th feature, respectively.
Feature extraction involves constructing and transforming raw data into meaningful features to be used as inputs of a machine learning model. Dimensionality reduction and decomposing are some of the feature extraction techniques. An autoencoder-based dimensionality reduction method is employed for feature extraction.
When the relationships between the features and the target variables are not straightforward, the feature construction technique is valuable for creating new features from the existing data. In order to improve the prediction performance, this work hypothesized the engine’s degradation tendency to represent a new feature. The difference between the first sensor measurement when the engine starts to operate and the last measurement is calculated, and the process is illustrated in Algorithm 1.
In order to show the relationship between the feature 
 and the hypothesized engine’s degradation tendency, the correlation coefficient for the difference and the target value is calculated with the following formula: 
        where 
 and 
 are the individual sample points for the input feature and the engine’s degradation, respectively, and 
 and 
 are the mean values of those samples, respectively. Thus, based on the hypothesized correlated results, a new difference feature is intended to be added to the training dataset for RUL prediction.
        
| Algorithm 1 Constructing and adding new features | 
| 1: input: Data sequence | 
| 2: parameters | 
| 3: output: | 
| 4: begin | 
| 5: | 
| 6:   for to  do | 
| 7: | 
| 8: | 
| 9:   end | 
| 10: +D | 
| 11: end | 
  2.5. Change Point Detection
Change point detection is crucial in detecting the early signs of deterioration to prevent industrial equipment from unexpected disruptions [
54].
The PwL fitting method [
55] is used for change point detection in generating the target labels. It fits PwL functions to predict nonlinear or multiple trends in the data. By optimizing segment positions and associated slopes, PwL fitting provides a broad perspective and capacity to model changes in data patterns. This makes it particularly valuable in applications such as change point detection and trend analysis.
The Python library pwlf [
55] performs a least squares fit, which solves for the β parameters that minimize the sum-of-squares error of the residuals for any given set of breakpoint locations 
 if the change points are known. The sum of the squares of the residuals can be expressed as a function dependent on the change point locations 
SSR(
b), and the optimization problem is as follows:
        where 
 is the 1D data, 
 is the number of change points, and 
 is the last breakpoint. 
 is the number of line segments. The library assumes that 
 is the first value and 
 is the last value of the one-dimensional dataset. Differential-evolution-based global optimization is used to find the best change point locations when the change points are unknown but the desired number of segments is known.
Although PwL fitting is not a change point detection algorithm per se, it has the same principle as the PwL target labeling approach. It is helpful in the early detection of the first change point with the practical usage of a line segment parameter. The change point detection method based on pwlf offers the advantage of the early detection of breakpoints through its line segment parameter. It also possesses the potential to address the challenges related to the HS division problem of engine degradation processes.
  2.6. Proposed Network Structure
The structure of the deep network proposed in this study is illustrated in 
Figure 4. First, a dimension-reduction-based autoencoder is used as a preprocessing step that helps to detect the change point by reducing the multi-sensor turbofan engines to a single signal. The input sample is shaped into a 2-dimensional tensor as 
, where 
 and 
 denote the time sequence length and the number of the features, respectively. The details of the input data preparation are described in 
Section 3.1. 
A 1D-CNN is employed in the first layer to extract spatial features from the fused sensor measurement and the newly added feature. Then, a max-pooling layer is used for reduced complexity and overfitting. The 1D-CNN layer configuration is , where  is the number of filters and  is the filter size. In order to keep the size of the feature map fixed, zero-padding is used. The pooling layer resizes each feature map independently by utilizing the max operation and  is the pooling size. An LSTM is used for revealing temporal information from the extracted features and  denotes the number of LSTM cells. Next, a fully connected layer smooths the feature matrix and maps these obtained feature vectors to the target labels of the samples.  is used to represent the number of neurons used in the fully connected layer. In the end, a fully connected output layer with one neuron is placed to make RUL predictions.
ReLU is the activation function, and the Adam algorithm is the optimization algorithm for the training network. 
  4. Conclusions
The effective prediction of the RUL value for turbofan engines is critical to ensure optimal performance and minimize maintenance costs. This paper proposes a prognostic procedure addressing the feature engineering process and a hybrid network based on 1D-CNN-LSTM.
Our approach utilized 1D-CNN architecture to extract spatial information from raw sensor data, while LSTM was used to reveal temporal information from the extracted features. We implemented a range of feature engineering and preprocessing methods. Feature selection, filtering, and normalization were used to improve the data quality and reliability. Also, these methods made it easier for deep learning models to use and interpret the data. In addition, feature engineering approaches, such as dimension reduction and feature creation, not only facilitated the interpretation of existing data but also increased the efficiency of the data and enabled the creation of new features that facilitated the learning of the prediction model. Our proposed prognostic approach demonstrated that the hybrid neural network produced more accurate RUL predictions when combined with practical preprocessing steps. To label the operational life of turbofan engines, we used the PwL target-labeling method, which divides the operational life of the engines into two stages, namely, the stable healthy stage and the linear degradation stage. We leveraged dimensionality reduction for a better representation of the main characteristics and the breakpoints of the sensor measurements. A change point detection method was used to determine the start point of degradation for engine-specific target RUL labeling. Using the maximum RUL specific to each engine presented a challenge due to insufficient degradation information. However, it also brought the solution closer to a realistic scenario. In addition, we constructed a new feature from the existing dataset using the initial sensor measurements to provide more degradation information in the early stage. The proposed methodology provides better prediction results compared with the studies using engine-specific target RUL labels and the actual test dataset. 
The prognostic procedure proposed for RUL prediction in turbofan engines essentially demonstrated the effectiveness of a feature engineering technique, specifically the feature construction step, and the 1D-CNN-LSTM network model. The proposed hybrid network was tested on multiple hyperparameter combinations using k-fold cross-validation to obtain the best results. The hybrid model was compared with the 1D-CNN and LSTM networks separately, and our findings demonstrate that the CNN feature extraction capability significantly improved the performance of the LSTM network. 
The results obtained provide evidence for the efficacy of the newly created “difference” feature through feature construction. Furthermore, the hyperparameter optimization study revealed a reduced need for deeper and more complex network models in the context of feature engineering processes. 
The prediction of RUL has been a longstanding and persistently relevant problem, which has been further invigorated by the advancement of artificial intelligence methods. Considering the current trend in various applications related to turbofan engines and other domains associated with the RUL problem, this study was aligned with contemporary requirements. The findings substantiate the impact of data preprocessing on prediction performance and reveal a reduced dependence on complex network structures. For future work, we aim to achieve higher accuracy in the early stage of the deterioration by combining engine-specific RUL labeling, similarity-based methods, and neural networks. We also aim to adopt more robust approaches to determine the change point for PwL labels. Overall, our proposed approach provides promising results and opens avenues for further research in turbofan engines’ prognostics and health management.