Deep ‐ Learning Based Prognosis Approach for Remaining Useful Life Prediction of Turbofan Engine

: The entire life cycle of a turbofan engine is a type of asymmetrical process in which each engine part has different characteristics. Extracting and modeling the engine symmetry characteris ‐ tics is significant in improving remaining useful life (RUL) predictions for aircraft components, and it is critical for an effective and reliable maintenance strategy. Such predictions can improve the maximum operating availability and reduce maintenance costs. Due to the high nonlinearity and complexity of mechanical systems, conventional methods are unable to satisfy the needs of medium ‐ and long ‐ term prediction problems and frequently overlook the effect of temporal information on prediction performance. To address this issue, this study presents a new attention ‐ based deep con ‐ volutional neural network (DCNN) architecture to predict the RUL of turbofan engines. The prog ‐ nosability metric was used for feature ranking and selection, whereas a time window method was employed for sample preparation to take advantage of multivariate temporal information for better feature extraction by means of an attention ‐ based DCNN model. The validation of the proposed model was conducted using a well ‐ known benchmark dataset and evaluation measures such as root mean square error (RMSE) and asymmetric scoring function (score) were used to validate the pro ‐ posed approach. The experimental results show the superiority of the proposed approach to predict the RUL of a turbofan engine. The attention ‐ based DCNN model achieved the best scores on the FD001 independent testing dataset, with an RMSE of 11.81 and a score of 223.


Introduction
In the aerospace industry, safety and reliability are key factors that require great attention due to the tough working conditions and long operating hours for aerospace systems [1]. One of the safety issues is turbofan engine failure, which is a critical component in airplanes. In addition, the turbofan engine is a very sophisticated and accurate piece of thermal equipment [2] in an airplane, involved with 60% of airplane issues. Therefore, any failures should be detected at the soonest possible moment. Such early detection can help to avoid any catastrophic damage or abrupt halting that may lead to economic and human losses [3]. Therefore, predictive maintenance and monitoring are a necessity to build a cost-effective maintenance strategy. The maintenance approach should be stable and flexible to increase the system's reliability and efficiency, and availability results in reduced downtime and operating costs [4,5].
Therefore, mechanical equipment prognostics and health management (PHM) have received considerable attention, and the prediction of RUL is at the center of PHM. Thus, approach to the raw sensor data. Then, a new attention-based DCNN model can successfully extract high-level abstract features through a deep learning network. The corresponding RUL value may be estimated using the learned representations. The suggested model, which employs a time window, attention mechanism, and a deep CNN structure, is intended to provide higher prognostic accuracy than shallow or typical machine learning methods presented in the literature.
The effectiveness of this approach was validated on C-MAPSS turbofan engine benchmark datasets provided by the National Aeronautics and Space Administration (NASA). The main contribution of this paper is twofold. First, in this paper we propose a data-driven approach to predicting the RUL of a turbo engine system using an attentionbased DCNN model and a time window approach used for preparing the sample to results in better feature extraction.
Second, we have employed a measure of prognosability (feature ranking and selection) as indicators of the engine's condition at failure. The features with less variability were thereby eliminated to improve the prediction accuracy.
The rest of this paper is structured as follows. Section 2 highlights the background of the study and the related works. Then, Section 3 outlines the proposed research methodology, whereas Section 4 describes the experimental findings. Lastly, Section 5 concludes the paper and highlights the future work.

Related Works
Prognostic health management (PHM) refers to a process system that can forecast the future health status of mechanical systems in the engineering field. PHM is critical to ensuring the reliability of machinery systems, and it depends on the sensor's capabilities and analysis by monitoring the condition of mechanical components to measure their health portion [20,25]. Therefore, accurate RUL predictions are essential in PHM for many fields, including manufacturing and various industrial cyber-physical systems [27]. Furthermore, if the exact mechanical equipment RUL is known, manufacturing industries can plan future maintenance ahead of time and guarantee a seamless repair and maintenance process. Consequently, RUL prediction methods are mainly classified into physical model-based methods and data-driven methods.

RUL Prediction Based on Physical Models
Model-based approaches typically build a degradation model for rotary equipment such as a turbofan engine based on its physical structure, which is then used to predict RUL. For example, the authors in [28] present a fatigue-crack growth law, using fracture mechanics knowledge to demonstrate the fatigue-crack development model's application. Jiang [29] suggested a method for predicting RUL based on a model of convex optimization-life parameter deterioration. Gao et al. [30] proposed a physical model for RUL prediction based on Bayesian theory. The authors in [31] developed a Hertzian contact dynamic theory model of a ball bearing and raceway and showed that appropriate damping can extend the life of the bearing. Model-based solutions necessitate the establishment of an accurate degradation model; however, the complex structure of components and operating mechanisms and the inherent uncertainties in engineering practices make this problematic.
Additionally, as the structure of the mechanical system becomes more complex, those physical models that are primarily concerned with exploiting the system fault mechanism may not be the most feasible for practical prognostications of complex mechanical systems, such as turbofan engines or solar applications [32], because the uncertainty in the machining process and measurement noise are not incorporated into the physical models.

RUL Prediction Based on Data-Driven Models
Over the last few years, data-driven prognostics has shown a massive interest in establishing a link between the data collected from monitoring rotary equipment and the relevant RUL. As a result, numerous machine learning algorithms, most notably those based on neural networks, have been utilized to perform mapping between the gathered feature data and the related RUL. The benefit of using DL-based methods in prognosis models is that DL can accurately simulate extremely non-linear, complex, multi-dimensional structures without previous information on the system's physical behavior. In addition, numerous forms of engineering system data, including raw sensor measurements, can be utilized directly as model inputs to estimate the RUL based on history trajectory data, which is significant for enhancing the reliability and safety of turbofan engine systems. Data-driven approaches generate estimate models based on historical run-to-failure data, avoiding the limitations of physical failure models and expert knowledge [9,10]. For instance, the authors in [33] suggested an LSTM-based scheme for the RUL estimation of aeroengines in the case of high noise levels, hybrid faults, and complex operations, as an enhancement of the standard recurrent neural network (RNN) approach. The authors in [34] used LSTM for a tool wear health monitoring task. The authors in [23] suggested an optimized DL-based method for multi-bearing RUL collaborative predictions by integrating both time and frequency domain functions. Numerical tests validated the developed method's feasibility and its superiority on a real dataset. Therefore, a restricted Boltzmann machine method was proposed by the authors in [35] for the learning of feature representation to calculate the RUL of machines, using the new concept of regularization and an unsupervised algorithm for a self-organizing map. The authors in [36] suggested a multiobjective deep belief network (MODBN) ensemble approach. An evolutionary method was combined with a standard DBN train method to develop several DBNs simultaneously, while keeping accuracy and diversity in mind.
In [37], the LSTM was suggested as a version of the RNN for turbofan engine RUL estimations. In the field of sequence prediction, the use of LSTM is popular, but it is timeconsuming. The use of CNN for RUL estimations for the same aeroengine was proposed in [14]. The authors used a TW method to prepare the raw data samples, which helps to collect more degradation data. As a result, the model dimension inputs grow, making the development of a neural network model a difficult task. This raises the question of how to build network layers and network nodes to prevent overfitting, and to minimize time and computational costs, while avoiding local minima.
Another recent study was conducted by Peng et al. [2]. The integration of a 1-D convolutional neural network with a complete convolutional layer (1-FCLCNN) and LSTM was suggested as a tool for predicting RUL. To extract the spatial and temporal features of the FD001 and FD003 datasets developed with a turbofan engine, this method uses LSTM and 1-FCLCNN. CNN applications in RUL-related fields have also attracted much attention from various researchers [12]. The authors in [13] were the first to use the deep CNN approach for RUL predictions. The results showed that CNN outperformed the support vector regression, multi-layer perceptron, and significance vector regression approaches. The CNN approach that was proposed in [13] has been tested and evaluated on the C-MAPSS dataset, for which the RMSE obtained was reported to be 18.45.
Another study [38] presented a novel method for deep feature learning for RUL prediction using time-frequency representation (TFR) and multi-scale CNN networks (MSCNN). TFR can effectively disclose the non-stationary nature of the bearing degradation signal. Using wavelet transform, the authors obtained TFRs that contain a wealth of important information after accumulating time series degradation signals. Due to their high dimensionality, bilinear interpolation was used to minimize the size of these TFRs, which were then employed as inputs for the deep learning model. However, the proposed approach [38] still has some drawbacks. First, the training time of the algorithm is slow, and the computation speed needs to be increased. Furthermore, a graphic processing unit (GPU) is needed to help to process the original TFR. A different study was proposed by Zhang et al. [39], known as CNN-XGB (eXtreme gradient boosting), with an extended time window to tackle an issue affecting aero-engine systems, namely, that these systems typically operate under a variety of operating conditions, which may affect the deterioration trajectory of the system differently and hence impair the accuracy of the RUL prediction. The proposed approach was validated using NASA C-MAPSS turbofan aeroengine datasets. The RMSE obtained was 20.3, and the training time was reported to be 621.7 s. A summary of the literature is provided in Table 1.
When the operating conditions are more complex, the RUL prediction is more challenging, and this kind of problem deserves further study. The proposed model training time is 142 s, which shows its superiority in reducing the training time and model complexity compared to several popular methods in the literature

The Proposed Approach
This study provides an optimized DCNN-based method architecture. Figure 1 indicates the approach to RUL prediction proposed in this study. Essentially, it consists of four distinctive parts: data pre-processing, feature extraction, model training, and estimating remaining useful life.
The proposed DCNN-based model is trained and evaluated, applying standard performance evaluators for prediction models to achieve the optimum model for predicting the RUL of different engine units. In the first section, the C-MAPSS benchmark dataset is introduced. The second section focuses on the proposed deep candidate model, whereas the final two steps of the suggested approach are discussed in subsequent sections.

C-MAPSS Benchmark Dataset
In this study, we selected the C-MAPSS aero-engine degradation dataset provided by NASA to verify the effectiveness of the proposed DNN candidate models. The primary control system comprises three components: a fan controller, a regulator, and a limiter. The fan maintains normal flight conditions by directing air into the inner and outer culverts ( Figure 2). The combustor is supplied with compressed high-temperature, highpressure gases via a low-pressure compressor (LPC) and a high-pressure compressor (HPC). Low-pressure turbines (LPTs) can decelerate and pressurize air, increasing aviation kerosene's chemical energy conversion efficiency. High-pressure turbines (HPT) generate mechanical energy by striking turbine blades with high temperatures and highpressure gas. The low-pressure rotor (N1), high-pressure rotor (N2), and nozzle all contribute to the engine's combustion efficiency. The C-MAPSS is widely used in prognostic studies, containing four sub-datasets under different operating conditions (OCs) and failure modes (FMs). Every sub-dataset includes a training set, a testing set, and testing RUL values, and consists of twenty-one sensors and three operation settings [41]. Each engine unit has varying degrees of wear. Over time, the engine units start to degrade until they reach system failure, which is described as an unhealthy time cycle. Therefore, the sensor records in the testing set are terminated before the occurrence of a system fault. Table 2 provides information on the turbofan degradation engine systems dataset. In this experiment, we aimed to predict the RUL of a single-engine unit randomly selected from the testing set. In this study, the four subsets of data, FD001-FD004, are used for attention-based DCNN model verifications. FD001 is the simplest subset of data and FD004 is the most complex subset of data, in which the engines have six OCs and two FMs. When the operating conditions are more complex, the RUL prediction is more challenging, and herein the proposed method's effectiveness is verified and validated with four different subsets of data.

Feature Selection Using the Prognosability Algorithm
To improve the RUL prediction accuracy, we have employed a feature selection metric called the prognosability algorithm. These metrics assign a numerical value to the recognized condition indicators on a scale from zero to one. A higher-ranked feature more accurately monitors the degradation process and is thus more suitable for training the RUL prediction model. Prognosability is a measure used to determine a feature's variability during failure in contrast to the range between its initial and end values. A more prognosisable feature exhibits less volatility during failure in respect to the range between its initial and ultimate values. The Y values vary from zero to one, with one indicating that X is perfectly prognosable and zero indicating that X is not prognosable. The computation of prognosability uses this formula: where xj denotes the measurements vector made on the j th system, M is the number of monitored systems, and Nj denotes the number of measurements made on the j th system. Therefore, we have ranked the 21 features using prognosability, as presented in Figure 3. The selected features are (s2, s3, s4, s7, s8, s9, s11, s12, s13, s14, s15, s17, s20, s21) and the irregular or unchanged sensor data have been eliminated (e.g., s1, s5, s6, s10, s16, s18, s19). Thus, the same features were confirmed and used by Zhang et al. [36] and Duan et al. [42] as the feature inputs of the DCNN-based model after normalizing the inputs and preparing the raw samples using the time window approach.

Data Normalization
Data from several raw sensors, operating parameters, and runs to failure are granted in real-world applications. Sensor data must be standardized in regard to each sensor before training and testing since the value scales of different sensors can vary.
Therefore, sensor data can be normalized by letting denote the mean of the i-th sensor data from the engine and denote the standard deviation. In the Z-score normalization, is the normalized sensor output. The raw sensor data is scaled within the range of [0, 1] using Min-Max normalization as follows. (2) where , is the -th measuring point of the -th sensor.
, is the , normalized result. and denote the maximum and minimum values of the th sensor.
Because the time series contains more information than a single point, in this study we implemented a TW to make use of multivariable temporal information, as in [14,36,42]. For the training data set FD0001, the time window length was selected as 35. All the historical data in the TW were gathered at each step to form a high-dimensional vector of length 14 × 35 as the input data. Thus, 14 sensors' measurements out of 21 sensors were employed as the raw input features, as performed in [14,43]. The dynamic characteristics of turbofan engine operating data under different operating conditions are significantly different, which leads to different network structures for the extraction of features. The proposed attention-based DCNN model structure was designed to predict the RUL of turbofan engines under both single and multiple OCs. Therefore, this paper utilizes the four subsets of the dataset shown in Table 2, and the FD001 subset of data was used for experimental analysis, as in most of the literature [14,36,42,43]. Figure 4 shows the normalized sensor measurements.

Deep Convolutional Neural Networks
CNNs are designed to handle learning problems involving high-dimensional input data with complex spatial structures, such as image classification [44,45], text classification [46] video processing [47,48], amino acid sequence prediction [49,50], and time series failure signals. The main three layers of CNN are as follows.

Convolutional Layer
The convolutional layer is the most significant component of convolutional networks. Feature maps are generated by sliding the convolution kernel over the data and convolving with the covered data. Furthermore, the mutual weights property reduces model parameters and the possibility of overfitting. The calculation process of the i-th feature map of the l-th convolutional layer , is as follows: where z represents the convolution operation's output, * denotes the convolution operator, is the i-th convolution kernel, xl -1 is the input volume, and and φ(z ) represent the bias term and non-linear activation function. Finally, C denotes the number of input channels.

Pooling Layer
The pooling layer aims to combine similar features into one and speed up the calculation using a non-linear down-sampling function [51]. The most popular pooling layer is the max-pooling layer. The pooling layer's inputs are the previous layers' function maps, and the outputs are the limit of a local patch of the inputs. The following is the function: where the is i-th feature map of the l-th pooling layer, is the i-th feature map in the previous layer l-1, max (.) means the max-pooling, and p and s represent the pooling size and the stride size.

Fully Connected Layer
The fully connected layer summarizes the features and outputs prediction results as the final layer of the convolutional neural network [51]. The output of the l-th fully connected layer is as follows: (7) where x −1 is the output of the previous layer -1, ω and b represent the weight matrix and the bias vector.
CNNs attempt to learn hierarchical filters which can transform large input data to accurate class labels using minimal trainable parameters. This is accomplished by enabling sparse interactions between input data and trainable parameters through parameter sharing to learn equivariant representations (also called feature maps) of the complex and spatially structured input information [52]. In a deep CNN, units in the deeper layers may indirectly interact with a large portion of the inputs due to the usage of pooling operations, which replaces the output of net at a certain location with a summary statistic and allows the network to learn complex features from this compressed representation [14]. The so-called "top" of the CNN is usually composed of many fully connected layers, including the output layer, which uses the complex features, learned by previous layers, to make predictions.
The attention-based DCNN has excellent learning ability, which is mainly achieved by employing multiple non-linear feature extraction. The learning ability of a DCNN is enhanced by utilizing multiple non-linear feature extraction. Furthermore, it can learn hierarchical representations from data on its own. As a result, the scale of the convolution kernel and the number of convolution layers significantly affect the prediction performance. The proposed network architecture for the RUL estimation undertaken in this study is shown in Figure 5. The input data is two-dimensional (2D). The feature number is one dimension, and the sensor's time sequence is the other. The feature maps are then combined using a convolutional layer with one filter with size of 3 × 1. The attention mechanism is used to extract high abstract degradation and trend features. After attention layter, the extracted feature wsill be connected with a fully connected layer. In addition, the dropout method will be used to relieve overfitting. Additionally, RELU is the activation feature of each layer. In this study, the Adam optimization algorithm will serve as the optimizer. Adam is a rate-optimized adaptive learning method utilized to train deep neural networks [53]. Given the current state of the turbofan engine datasets, we increased the penalty for lag prediction; the loss is denoted as follows.
where is the actual value and ̂ is the predicted value. N is the number of validations set. When the true value is greater than the expected value ̂ , the penalty coefficient = 1; otherwise, = 2. The following section discusses the proposed attention mechanism for better degradation feature extraction for the engine system.

Proposed Attention Mechanism
The attention mechanism enables a model to narrow its focus on critical regions of the selected feature space. It operates by paying more attention to subsets of the data to obtain more optimal scores. The attention mechanism is summarized in three parts, as presented in Figure 6. The proposed attention scheme modifies the method through which attention weights are calculated. Unlike conventional attention processes, this one employs the sigmoid activation function (10) rather than the SoftMax function. Because the SoftMax function normalizes the weights, it reduces the likelihood that more than one variable is relevant for prediction, as is frequently the case in a multivariate time series. This stage enables the model's attention mechanism to pick out the degradation characteristics more effectively. tanh

Time Window Technique
In multivariate time series-based issues such as RUL prediction, temporal sequence data usually include more information than multivariate data points taken at a single time step. Thus, the processing of time sequences has a great deal of promise in terms of improving prediction performance. In this study, we used a time frame to prepare the data to take advantage of multivariate temporal information.
Hence, Ntw represents the time window's size. All sensor data from the previous TW are compiled into a high-dimensional feature vector and utilized as the network's inputs at each time step. Figure 7 illustrates a normalized data sample from the 14 chosen sensors with a time window size of 35, concerning a single-engine unit in the training sub-dataset FD001. For the RUL target label, a piecewise linear function is used, as in [42], which is defined as where Rulmax is a preset value. Rulmax was set to 150 cycles for each subset of data, as in [39,43]. According to the experimental analysis, m was 35, and l was 1. FD001 had training samples of 17,731 and testing samples of 100, because only the most recent measurements of the test sets were used. The effectiveness of the piecewise linear function on this prediction problem has been confirmed in the literature [13,14,36]. Moreover, the processed label values were smoothed. Figure 8 shows the piecewise RUL target function of an engine unit which the full-time cycle is 130. Lastly, Figure 1 depicts the proposed prognostic experimental approach. First, the FD001 subset of the data was pre-processed by selecting 14 raw sensor measurements and normalizing the accompanying data to fall within the range of [−1,1]. Next, the training and testing datasets were created, with each sample providing information about the time sequence within the length of time frame Ntw. Hence, the normalized data prepared in 2dimensional format were directly fed into the attention-based DCNN model as inputs. As a result, hand-crafted signal processing features, such as skewness, kurtosis, and so on were unnecessary. Thus, the suggested method requires no prior knowledge of prognostics or signal processing. Moreover, we used a randomized search method to find the best optimal hyperparameters over a vast hyperparameter space. Randomized hyperparameter search provides improved hyperparameters for the proposed DCNN model structure with a limited computing budget and faster convergence speed. The attention-based DCNN model's effectiveness is demonstrated in the following section.

Experimental Results and Discussion
This section summarizes the experimental findings and discusses their significance. In the first section, the experimental results are discussed. In the second section, the TW effects and the proposed model's training time are examined. Finally, a comparative analysis with literature is provided in the last section.

Experimental Results
After repeating the experiment ten times, the proposed algorithm's training parameters were tuned to obtain the best score value, shown in Equation (10), and the test set's lowest root mean square error, shown in Equation (11).
1, 0 1, 0 According to the PHM Data challenge in 2008 [54], an asymmetric scoring function penalizes latent predictions 0 more severely than early predictions 0. This is for maintenance reasons. Predictions made too late may cause maintenance activities to be delayed, whereas predictions made too early may not be hazardous but consume more maintenance resources. Figure 9 demonstrates that the RMSE and score functions are sparser towards higher values than they are towards zero, confirming the results' validity. The proposed attention-based DCNN candidate model for RUL prediction was constructed, and the configuration was specified, including the number of hidden layers, as well as the number and the length of convolution filters. The parameters used in this experiment are presented in Table 3. The attention-based DCNN model received normalized training data as inputs and produced as an output labeled RUL values for the training samples. The back-propagation learning method was employed to update the network's weights, and the Adam technique was employed simultaneously with mini-batches. The samples were randomly separated into numerous mini-batches of 512 samples for each training epoch and loaded into the training system. Following that, the network information, i.e., the weights in each layer, was optimized using the mini-mean batch loss function. It should be mentioned that the batch size selection affects the performance of network training [55]. Based on the trial results, a batch size of 512 samples was determined to be appropriate and was utilized in all of the case studies in this study.
Moreover, a variable learning rate was used. The initial learning rate was 0.003 for the first 20 epochs of optimization. Following that, a learning rate of 0.0009 was utilized to ensure consistent convergence for the remaining 12 epochs. By default, the maximum number of training epochs is 640 for the attention-based DCNN candidate model. The time window size is an essential factor affecting the prediction accuracy of the proposed model. Figure 10 shows the effect of the time window size on the model performance. The prediction results of RUL are affected by the amount of historical information. As shown in Figure 10, increasing the time window size can improve the prediction accuracy of the RUL of the engine. Note that the selected time window is determined by the number of the shortest cycle of the engine test set. Therefore, the time window sizes (Ls) of the FD001 and FD002 data sets were 30, and those of the FD003 and FD004 data sets were 35. Furthermore, we trained the proposed attention-based DCNN model 10 separate times to exclude the effects of random disturbances and to take the average of the results.
The key parameters of the proposed model are summarized in Table 3. In the attention-based DCNN model, the purpose of the pooling layer is to merge similar features into one using non-linear down-sampling functions and to speed up the calculation. The max-pooling layer is the most used pooling layer. The inputs of the pooling layer are the feature map from the previous layers, and the outputs are the maximum of a local patch of the inputs. However, an experiment was conducted to test and verify the effectiveness of the proposed DCNN model with and without the pooling layer, as shown in Table 4. Therefore, the experimental findings confirm that the attention-based DCNN model architecture without the pooling layer achieved a better result compared to the attention-based DCNN model architecture with the pooling layer, with a difference in RMSE error of about 6.42, where the model with the pooling layer obtained an RMSE of 21.34 and the model without the pooling layer obtained an RMSE of 14.92. The turbofan engine degradation simulation data used in this study were numerical data, and the dimension of the raw feature was relatively low. Although the pooling operation improves the computing efficiency, some useful information is filtered in this prognostic approach. Table 4 shows the different prediction effects with and without the pooling layer in the model. The network structure without the pooling layer showed better results. Figure 11 shows the RMSE of the attention-based DCNN model during the network training, with the graph showing that the more number alterations and epochs, the more the model accuracy improved. Therefore, the number of alterations was set to 32 per epoch. The maximum alteration of 640 was observed during the training, with a learning rate of 0.0009. Therefore, the effect of increasing the number of convolutional layers is presented in Figure 12, showing that the attention-based DCNN model achieved the lowest RMSE with five convolutional layers.

Case Study of Turbofan Engine System
Four verification tests were conducted on the four subsets of the C-MAPSS dataset to verify the effectiveness of the attention-based DCNN model. Each subset of the data had different operating conditions. When the operating conditions are more complex, the RUL prediction is more challenging. Figure 13 shows the RUL prediction results of two verification tests performed on FD001 and FD002 (engine 73 and engine 39, randomly selected cases). Figure 14 shows the other two evaluation tests conducted on the FD003 and FD004 subset of data. The prediction results of the proposed model were more accurate in relation to the engine degeneration. This is because the model can extract more failure features from the sensor data with increasing degradation. The safety of the system can be improved by accurately predicting the RUL near the stage of the engine failures. In the context of aerospace industry risk management, deep-learning-based RUL prediction can assist managers in assessing the likelihood of a system failure prior to a maintenance window. Therefore, maintenance time is fixed in large-scale manufacturing, and several forms of equipment are maintained in a single maintenance session. It is not feasible or cost-effective to maintain all machines in a single window. As a result, the manager must pick which equipment will be serviced during the scheduled maintenance window. A density chart of RUL prediction error can be generated using the RUL modelbased deep learning process. As illustrated in Figure 15, the manager can predict the likelihood of a machine failing before the next maintenance window. The management team has the option of adding a machine to the present maintenance list or deferring it until the next maintenance window. Additionally, the proposed approach shows high robustness and generalization ability, and it can be used in practice as an industrial condition-based maintenance strategy for several manufacturing industries.

Comparison with Literature
This section compares the proposed attention-based DCNN predictor with state-ofthe-art methods. In the literature, different DL methods have been used to predict RUL using the C-MAPSS benchmark dataset. Table 5 shows a comparison of the proposed deep candidate model with related literature contributions. The comparison is only demonstrated for the available metrics, but essentially, it conveys the promising results of the proposed attention-based DCNN predictor. The results confirmed that the attentionbased DCNN model surpasses the other methods for predicting RUL on the entire benchmark independent testing data set. Only one study obtained a better result in the FD003 and FD004 subset of data.  Table 5 shows that our proposed attention-based DCNN model has outperformed all the previous models in the literature. Based on the experimental findings, it was observed that increasing the TW results in improving the RUL prediction accuracy. The proposed attention-based DCNN model predicts the RUL of turbofan engines with high accuracy and without the requirement to comprehend the engine construction or failure mechanism and without the need for professional knowledge and experience. It simplifies the modeling process and can serve as a decision-making tool for aircraft engine maintenance and health management.
Additionally, the visual processing of sensor signals using time and frequency domains has achieved excellent results, thus showing its superiority in the diagnosis and examination of rotary machines and resolving the gap between experts at different levels. Further research and popularization will be of great significance for diagnosing such a complex turbofan engine without prior knowledge of system degradation. It could significantly reduce the incidence rate of system failure and improve RUL estimations and industry maintenance strategies.

Conclusions
In this paper, a data-driven method-based deep learning approach was proposed to predict the remaining useful life (RUL) of a turbofan engine. Deep learning tends to give decision-makers new insights into their operations, real-time performance indicators, and costs. In this study we aimed to accurately predict the remaining useful life of the turbofan engine, which is significant for improving the reliability and safety of turbofan engine systems. Therefore, a time window technique was adopted to prepare the samples of raw data to fit directly into the proposed model. The dropout method was used to relieve the overfitting issue during the training of the model. The attention mechanism was integrated with DCNN structure to mine useful degradation features from complex historical data. The proposed model's superiority and effectiveness were verified using the C-MAPSS benchmark dataset. The experimental results showed a minimal error between the estimated and the true RUL value in the testing subset of data of the engine units. In addition, the selected time window size significantly improved the prediction performance of the model.
Additionally, during the experiment, it was observed that as the degree of degradation increases, the prediction results are more accurate. Thus, while the proposed approach obtains good experiment results, future architecture optimization is necessary. As with all empirical research, this study has significant limitations. For instance, the model can be further enhanced in future works by increasing the number of convolutional nuclei and hidden neurons in the fully connected layer. Furthermore, it is well-known that in measurements, there are uncommon, inconsistent observations that are outnumbered by most of the other observations, referred to as anomalies. Lastly, since the raw vibration signals are used directly as the input, this diagnostic model requires a more complicated network structure to verify the correctness of the results, resulting in a high calculation load. Thus, a deep hybrid learning model and further signal pre-processing implementations will be investigated to eliminate duplicate information and obtain fault characteristics.

Conflicts of Interest:
The authors declare no conflict of interest.