Next Article in Journal
Analysing Edge Computing Devices for the Deployment of Embedded AI
Previous Article in Journal
Predicting Tacit Coordination Success Using Electroencephalogram Trajectories: The Impact of Task Difficulty
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Aero-Engine Remaining Useful Life Prediction Based on Bi-Discrepancy Network

College of Equipment Management and UAV Engineering, Air Force Engineering University, Xi’an 710051, China
*
Author to whom correspondence should be addressed.
Sensors 2023, 23(23), 9494; https://doi.org/10.3390/s23239494
Submission received: 18 July 2023 / Revised: 31 August 2023 / Accepted: 20 September 2023 / Published: 29 November 2023
(This article belongs to the Section Vehicular Sensing)

Abstract

:
Most unsupervised domain adaptation (UDA) methods align feature distributions across different domains through adversarial learning. However, many of them require introducing an auxiliary domain alignment model, which incurs additional computational costs. In addition, they generally focus on the global distribution alignment and ignore the fine-grained domain discrepancy, so target samples with significant domain shifts cannot be detected or processed for specific tasks. To solve these problems, a bi-discrepancy network is proposed for the cross-domain prediction task. Firstly, target samples with significant domain shifts are detected by maximizing the discrepancy between the outputs of the dual regressor. Secondly, the adversarial training mechanism is adopted between the feature generator and the dual regressor for global domain adaptation. Finally, the local maximum mean discrepancy is used to locally align the fine-grained features of different degradation stages. In 12 cross@-domain prediction tasks generated on the C-MAPSS dataset, the root-mean-square error (RMSE) was reduced by 77.24%, 61.72%, 38.97%, and 3.35% on average, compared with the four mainstream UDA methods, which proved the effectiveness of the proposed method.

1. Introduction

Aero engine prognostics and health management (PHM) is an indispensable technology to enhance production dependability, operational safety, equipment maintenance effectiveness, and cost efficiency [1,2,3]. The remaining useful life (RUL) prediction is a crucial issue in the PHM sector and has great research value [4,5]. The system maintenance process can be considerably optimized by an accurate RUL prediction, which can assess the system’s health and assist users in developing reasonable maintenance plans [6,7].
The existing RUL prediction approaches are roughly divided into physical-model-based and data-driven approaches. The standard physical-model-based prediction method struggles to provide an accurate deterioration model because of aero engines’ complicated structure, the mathematical model’s severe nonlinearity, and the close coupling between sensor data [8]. In contrast, data-driven prediction methods based on a large amount of sensor data primarily rely on the corresponding intelligent algorithms to learn and characterize the degradation process of the system. These practical methods do not necessitate a deep understanding of the system’s internal workings or the intricate degradation mechanism [9]. Notably, numerous data-driven RUL-based prediction techniques surfaced in response to the quick development of sensor technologies.
Nevertheless, data-driven methods have some limits for the RUL prediction in industrial applications due to the following reasons: (1) the modeling process requires a large number of labeled datasets, and collecting enough labeled data is often complicated in many practical applications [10,11]; (2) it is assumed that offline training data and online test data come from the same feature space and obey the same distribution. However, in actual prediction tasks, there are invariably domain disparities between training and test data due to differences in the equipment operating circumstances and failure modes [12,13,14].
In order to improve the generalization ability of the model under different operating conditions and failure modes, domain adaptation (DA)-based transfer learning (TL) models were widely developed and applied to cross-domain RUL prediction tasks [15], which aim at extracting domain-invariant features between the source and target domains by using labeled samples from the source domain and a small number of unlabeled samples from the source domain to achieve an online RUL prediction in the target domain. Transfer learning is generally split into adversarial transfer learning and feature-based transfer learning. In order to minimize the distributional differences among features, the core of feature-based approaches entails mapping cross-domain data to a common feature space [16]. For example, Sun et al. proposed a sparse self-encoder-based deep migration learning network. They investigated three migration strategies, namely, weight migration, feature migration, and weight updating, to achieve an RUL prediction for machine tools [17]. Mao et al. used transfer component analysis (TCA) to sequentially adjust the features of the target bearing from those of the auxiliary bearings and then used the modified features to predict the RUL [18]. The adversarial-based approach draws on generative adversarial networks and consists of generative and discriminative models. The generative model extracts features from the source and target domains, making the discriminative model unable to distinguish which domain the extracted features come from. In contrast, the discriminative model maximally distinguishes which domain the extracted features come from, and the two play with each other to realize cross-domain feature migration. For example, Costa et al. combined a domain-adversarial neural network (DANN) with LSTM to reduce the distributional differences in cross-domain RUL prediction [19]. Ragab et al. proposed a comparative adversarial domain adaptive (CADA) method for the cross-domain RUL prediction of aero engines, which developed an adversarial domain adaptive architecture with contrast loss while learning domain-invariant features [20].
Despite some success in extracting domain-invariant features using the migration learning techniques mentioned above, it is essential to note that these issues still exist: (1) The sensor data of complex systems collected under different operating conditions and failure modes have significant distributional differences, and sensor data at multiple degradation stages also have different distributions. Therefore, solely adopting the global domain adaptation (GDA) technique could confuse the fine-grained features between the sub-domains denoted by various degradation stages [7], lowering RUL prediction performance. Furthermore, if the sub-domain adaptation (SDA) is directly carried out, the sub-domains might not be aligned because of the significant distributional variations between the source and target domains, which are shown in Figure 1. (2) Most of the current adversarial-based transfer learning techniques call for introducing an auxiliary model (discriminator) for the domain alignment, isolating the domain adaptation task from the prediction goal, and adding to the computational burden. (3) Under particular tasks, domain discriminators cannot discover or process target samples with significant domain shifts, as they can only classify features.
This paper proposes an aero-engine RUL prediction model based on a bi-discrepancy network (BDnet) to solve the above issues. First, target samples with significant domain shifts are identified by maximizing the regressor discrepancy without introducing additional models. Then, domain-invariant features are searched for based on the adversarial relationship between the feature generator and the dual regressor to achieve the GDA. Second, the fine-grained features of multiple degradation stages are extracted by the local maximum mean discrepancy (LMMD) to achieve the SDA. In particular, the SDA is performed on top of the GDA, which helps resolve the issue that direct sub-domain adaptation is ineffective due to the significant disparity in distribution between the source and target domains. Finally, 12 cross-domain prediction tasks from the aviation turbofan engines’ C-MAPSS dataset are used for evaluation. The outcomes demonstrate that the RUL prediction model based on the BDnet outperforms various popular feature-based transfer learning algorithms and adversarial-based transfer learning techniques.
The main contributions of this article are summarized as follows.
  • First, the maximum regressor discrepancy (MRD) is proposed to detect and handle target samples with significant domain shifts without introducing additional models to achieve the GDA.
  • Second, the local maximum mean discrepancy (LMMD) is designed to capture fine-grained information from multiple degradation stages, from which the domain-invariant features can be better learned to promote RUL prediction performance.
  • Finally, the proposed BDnet-based RUL prediction approach is evaluated by an aircraft turbofan engine dataset under cross-domain conditions. The research results denote that the BDnet-based approach is better than some deep learning approaches without transfer learning and conventional transfer learning approaches.
The remainder of this paper is organized as follows. Section 2 discusses the theoretical background of MRD and the local maximum mean discrepancy in detail. Then, Section 3 discusses the RUL prediction method based on the BDnet in detail. Section 4 shows the results of the case study. Finally, Section 5 concludes this paper.

2. Theoretical Background

2.1. Maximum Regressor Discrepancy

In this paper, we draw on the concept of the maximum classifier discrepancy (MCD) [21] and propose the maximum regressor discrepancy to detect target domain samples that are significantly offset from the distribution of the source domain samples, as well as utilizing an adversarial training mechanism to gradually achieve the global domain alignment.
The distributional discrepancies between the source and target domain, in MCD’s opinion, may result in either excellent or unsatisfactory classification results if the network trained in the source domain is applied straight to the target domain. The target domain samples with subpar classification outcomes more accurately denote the domain variations. Two separate classifiers, F 1 and F 2 , are introduced to detect and process these samples. If there is a significant variation in the classification outcomes of the two classifiers for the same sample, the sample is deemed to have a low confidence level and requires retraining.
The specific training process Is as follows:
First, the cross-entropy loss function is used to train the feature generator, G, and classifiers, F1 and F2, to correctly classify the source samples, which can be expressed as
min G , F 1 , F 2 L ( X S , Y S ) L ( X S , Y S ) = E ( x S , y S ) ( X S , Y S ) k = 1 K I [ k = y S ] log p ( y | x S )
where x s , x t denote a labeled source data and a corresponding label, y s , drawn from a set of labeled source data { X s , X t } , respectively. K denotes the number of categories, I [ k = y s ] denotes the one-hot vector of labels, and p ( y | x s ) denotes the K-dimensional probabilistic outputs for the input, x s .
Second, on the basis of guaranteeing the classification ability of the model, the feature generator, G, is fixed, and the difference loss function is used to train two different classifiers, F1 and F2, to detect the samples with large domain shifts, which can be expressed as
min F 1 , F 2 L ( X S , Y S ) L a d v ( X t ) L a d v ( X t ) = E x t X t [ 1 K k = 1 K | p 1 ( y | x t ) p 2 ( y | x t ) | ]
where x t denotes an unlabeled target image drawn from the unlabeled target images, X t . p 1 ( y | x t ) , p 2 ( y | x t ) denote the K-dimensional probabilistic outputs for the input, x t , obtained by F 1 and F 2 , respectively.
Finally, the feature generator, G, is trained to minimize the classifier discrepancies, as shown in Equation (3).
min G   L a d v ( X t )
Unlike MCD, MRD consists of a feature extractor, F , and two regressors, R ^ , R ˜ , with identical structures. The feature extractor takes the labeled source domain data, D S = { ( X i S , y i S ) } i = 1 N S , and the unlabeled target domain data, D T = { ( X j T ) } j = 1 N T , as inputs to generate high-level feature denotations. However, the significant cross-domain difference results in a feature distribution mismatch. Figure 2 shows the data distribution of four randomly selected sensors in four subsets of the C-MAPSS dataset. Among them, the data distributions of FD001 and FD003 are more similar, and the data distributions of FD002 and FD004 are more similar, which are mainly related to the failure modes and operating conditions of the dataset. However, the data distribution of the same sensor in the four subsets is significantly different. Therefore, the network parameters learned from single-source domain data cannot directly and accurately predict the target domain data. In order to align the source and target domain features, this paper uses the dual regressor as a discriminator to detect target domain samples with significant domain shifts. Suppose the dual regressor gives widely differing predictions for the same target domain sample. In this case, it is considered a domain-inconsistent sample with high confidence. At this point, training the feature generator can reduce this inconsistency. Unlike the classification task, since the labeled domain of the regression task is continuous and infinite, it is impossible to measure the discrepancies among the target domain samples by measuring the error between the conditional probability outputs of the two networks.
Considering the data imbalance problem, this paper utilizes the Tversky coefficient to quantify the difference between the dual regressors; however, the non-differentiable nature of the Tversky coefficient leads to the problem of updating the parameters of the network, so a differentiable form of the Tversky coefficient is proposed and is given by
L t v = i = 1 N T d ( h ^ i T h ˜ i T ) d [ h ^ i T h ˜ i T + α ( h ^ i T h ^ i T h ˜ i T ) β ( h ˜ i T h ^ i T h ˜ i T ) ] = i = 1 N T d ( h ^ i T h ˜ i T ) d [ ( 1 α β ) h ^ i T h ˜ i T + α h ^ i T β h ˜ i T ]
where α = β = 0.5 denotes that the same weight is given to both regressors.  h ^ i T = c o n c a t e n a t e ( h ^ i 1 , T , h ^ i 2 , T , y ^ i T ) and h ˜ i T = c o n c a t e n a t e ( h ˜ i 1 , T , h ˜ i 2 , T , y ˜ i T ) denote the combination of the characteristics of the different stages for the two regressors. denotes the Hadamard product.
To facilitate the understanding of the process of implementing the GDA using a dual regression network, Figure 3 further demonstrates the three main steps.
  • Step A: In order to ensure the prediction ability of the model for the source domain samples, the feature extractor and the dual regressors are pre-trained using the source domain data to minimize the mean squared error (MSE), as shown in Equation (5):
min F , R ^ , R ˜ L R = i = 1 N S ( y ^ i S y i S ) 2 N S + i = 1 N S ( y ˜ i S y i S ) 2 N S
  • Step B: In this step, we train the dual regressors as a discriminator for a fixed feature extractor. Since two regressors can easily produce inconsistent predictions for target instances with different sources, the prediction discrepancy means that two regressors involve different predictive abilities for the target samples. Thus, we utilize the prediction inconsistency to detect those target samples. Specifically, by training the dual regressors to increase the discrepancy, if the source and target regressors provide different predictions for the same target samples, we consider them as target domain samples with significant domain shifts. This step corresponds to Step B in Figure 3. We add a regression loss to the source samples to guarantee the predictive ability of the model for the source samples. We experimentally find that our algorithm’s performance significantly drops without this loss. The objective is as follows:
min R ^ , R ˜ L t v + L R
  • Step C: We train the feature extractor to minimize the discrepancy for fixed dual regressors to move these target samples into the source support set. This step corresponds to Step C in Figure 3. This step denotes the trade-off between the extractor and the dual regressors. The objective is as follows:
min F L tv

2.2. Local Maximum Mean Discrepancy

Traditional DA methods usually use the maximum mean discrepancy (MMD) [22] to measure the discrepancy between the data distribution in the source domain and the data distribution in the target domain, and the MMD can be expressed as
d H ( p , q ) = E x p [ ϕ ( x S ) ] E x q [ ϕ ( x T ) ] H 2
where H denotes the reproducing kernel Hilbert space (RKHS) defined by the salient kernel. k means the kernel function relation k ( x S , x T ) = ϕ ( x S ) , ϕ ( x T ) . ϕ defines the transformation from the initial data to the RKHS. · means the inner product operation. In this paper, based on MMD, considering the intrinsic relationship of each sub-domain in the different degradation stages of the aero engine, the local maximum mean discrepancy (LMMD) is utilized to denote the discrepancy in the data distribution between the source domain and the target domain, as shown in Equation (9):
d H ( p , q ) = E c E x p c [ ϕ ( x S ) ] E x q c [ ϕ ( x T ) ] H 2 = 1 N c c = 1 N c x i S D S w i S c ϕ ( x i S ) x j T D T w j T c ϕ ( x j T ) H 2
where p c and q c denote the distribution of subdomains belonging to category c in the source and target domains, respectively. w i S c and w j T c denote the weights of the i-th source domain sample and the j-th target domain sample belonging to category c, respectively, besides i = 1 N S w i S c = 1 and j = 1 N T w j T c = 1 . Unlike the classification problem with finite-dimensional data labels, regression problems have infinite labels. Therefore, it is not possible to directly use LMMD. In this paper, we utilize the discretization strategy of the degradation process proposed by Zhang et al. [7] to discretize the whole degradation process into multiple degradation stages, and the i-th degradation stage can be expressed as
l i = r o u n d ( R U L max R U L t i R U L max N c )
where R U L max = 125 is the threshold of the piecewise linear degradation model. In detail, the early operating stages of an aircraft turbofan engine can be considered healthy. When the RUL is less than a threshold, the turbofan engine begins to degrade [23]. R U L t i is the actual remaining life of the i-th sample, and N c = 10 is the number of divided degradation stages. Therefore, for sample x i , w i c can be expressed as
w i c = l i c ( x i , l i ) D l i c
where l i c is the c-th element of the vector generated by l i after one-hot encoding. For the source domain samples, the actual RUL labels can be directly utilized to compute l i and, hence, w i S c . However, for the target domain samples, there are no actual RUL labels used to compute w i c . Therefore, in this paper, we utilize a dual regression network for prediction and compute y j T = ( y ^ j T + y ˜ j T ) / 2 as the RUL labels of the target domain samples to obtain l i , and, thus, w j T c . Thus, Equation (9) can be rewritten as
d H ( p , q ) = 1 N c c = 1 N c [ i = 1 N S j = 1 N S w i S c w j S c k ( f i S , f j S ) + i = 1 N T j = 1 N T w i T c w j T c k ( f i T , f j T ) 2 i = 1 N S j = 1 N T w i S c w j T c k ( f i S , f j T ) ]
where f S / T = F ( X S / T ) is the feature extracted by the feature extractor.

3. RUL Prediction Method Based on BDnet

Section 2 explains the details of implementing the BDnet to seek a domain-invariant representation for cross-domain regression. In this section, the prediction process of the RUL prediction method based on the BDnet is further explained. The RUL prediction can generally be divided into the offline training phase and the online prediction phase. For the offline training phase, firstly, the source domain data, D S = { ( X i S , y i S ) } i = 1 N S , and the target domain data, D T = { ( X j T ) } j = 1 N T , are divided by using the time window to obtain the training data. Secondly, the mapping relationship between the source domain data and the RUL is constructed by the source domain data, D S , and the GDA of the source and target domains is realized according to Equations (1)–(4). Finally, the network parameters are tuned using the LMMD to realize the SDA. For the online prediction stage, the partitioned target domain data are fed into the BDnet to obtain the mean and variance of the RUL prediction. Since the sensor values at the current moment in the aero engine degradation process are closely related to the degradation states at the time steps before and after, Bi-LSTM [24] is used as a feature extractor to extract the time dependence from the sensor data in this paper. Then, to improve the accuracy of the RUL and quantify the uncertainty of the prediction results, Bayesian neural networks (BNNs) [25] with Monte Carlo dropout inference are utilized as a regressor for uncertainty prediction. The pseudo-code of the RUL prediction method based on the BDnet is shown in Algorithm 1.
Algorithm 1 RUL prediction based on bi-discrepancy network
Input: Train data D S = { ( X i S , y i S ) } i = 1 N S in the source domain,
Train data D T = { ( X j T ) } j = 1 N T   in   the   target   domain ,   Test   data   D T test = { ( X j T ) } j = 1 N T test   in   the   target   domain ,   Maximum   number   of   training   e p o c h max .
1:Selection variable and swipe the time window to generate samples.
2:Initialize randomly the model parameters θ f , θ R ^ , θ R ˜ .
3:Initialize e p o c h = 1 .
4:while  e p o c h e p o c h max  do
5:  Input D S to Bi-Discrepancy Network for calculating regression loss L R through Equation (2).
6:  Update θ f , θ R ^ , θ R ˜ using the back-propagation method to minimize L R .
7:  Input D S to Bi-Discrepancy Network for calculating regression loss L R through Equation (2) and input D T to Bi-Discrepancy Network for calculating dual regression discrepancy L t v through Equation (1).
8:  Update θ R ^ , θ R ˜ using the back-propagation method to minimize L R + L t v .
9:  Calculate source domain data’s degenerative level l i S through Equation (7).
10:  Input D T to Bi-Discrepancy Network for calculating y ^ j T , y ˜ j T .
11:  Calculate target domain data’s degenerative level l j T with R U L t j = ( y ^ j T + y ˜ j T ) / 2 through Equation (7).
12:  Calculate source weights and target weights w i S c , w j T c of LMMD.
13:  Calculate LMMD loss L l m m d through Equation (9).
14:  Update θ f using the back-propagation method to minimize ( L t v ) + λ L l m m d .
15:   e p o c h e p o c h + 1 .
16:end while
17: input   D T t e s t   to   Bi - Discrepancy   Network   for   calculating   y ^ j T , y ˜ j T with the optimal model parameters.
where λ is a hyperparameter used to control the ratio between GDA and SDA.

4. Case Study

4.1. Dataset and Data Preprocessing

First, this paper uses the C-MAPSS dataset provided by the NASA-Ames Research Center [26], divided into four subsets, FD001~FD004, according to different operating conditions and failure modes. Each subset consists of a training set and a test set. The training dataset contains the whole lifetime data, and the testing dataset includes the initial degradation data within a period of time, which are used for model validation. The detailed setup is shown in Table 1. Since the four sub-datasets were obtained under different failure modes and operating conditions, the data distributions significantly differed. In this sense, the research objective of this paper is to train the BDnet only using the labeled source domain data under a certain operational condition and the unlabeled target domain data under another operational condition. Furthermore, the RUL prediction is implemented for the online target domain data.
The monitoring data for each flight cycle consist of 26 dimensions of feature data, where the first two dimensions denote the engine (unit) number and the cycle number. The following three dimensions are the flight conditions (flight altitude, Mach number, and throttle-stick solver angle), and the remaining 21 dimensions are the monitoring data. In addition, to prevent the redundancy and interaction of the multidimensional sensor data from adversely affecting the RUL prediction, this paper analyzes the degree of correlation among the sensors and the trend of degradation of each sensor over time to select the optimal sensor signal for the RUL prediction. The correlation and monotonicity measures can be expressed as follows [27].
c = | n n F t t n F t n t | [ n n F t 2 ( n F t ) 2 ] [ n n t 2 ( n t ) 2 ] m = | n δ ( F t + 1 F t ) n δ ( F t F t + 1 ) | n 1
where c and m represent correlation and monotonicity, respectively. F t represents the feature value at cycle t . δ ( ) represents the unit step function. n indicates the quantity of the signals. The range of both c and m  is [0, 1]. c = 1 represents the complete correlation, and m = 1 represents the monotonic increasing or monotonic decreasing. Thus, sensors with a larger ( c + m ) / 2 can better represent the changing trend and are selected. Take FD001 as an example: #2, #3, #4, #7, #8, #11, #12, #13, #15, #17, #20, and #21 are selected when the threshold is set to 0.75.
Moreover, the same sensor may have different measurements under different operating conditions. In order to reduce the effect of the operating conditions, the sensor data under each operating condition are min–max normalized so that the data size is limited to the range of [0, 1]. Finally, a time window with a window size of 30 and a step size of 1 is utilized to divide the raw data to generate the training and testing datasets.

4.2. Simulation Conditions and Parameter Settings

The hardware and software environment of the simulation is an NVIDIA GeForce RTX 3060 Laptop GPU, an Intel Core i7-11800 H CPU, 32 G RAM, Windows 11, and Python 3.7, based on the PyTorch framework. Manufacturer is Lenovo (Beijing, China).
The BDnet consists of three main parts: a feature extractor, a dual regressor, and a subdomain adaptor. The main parameters are defined as follows: the time window length is set to 30, the number of loops is 200, the early stop mechanism is set, the degeneracy inflection point is 125, the hidden layer dimension of the dual regressor is 32, and the optimizer adopts the Adam optimizer. In addition, the performance of the prediction model may be affected by the different batch sizes in the neural network, the learning rate of the feature extractor, the learning rate of the dual regressor, the dimension of the hidden layer of the Bi-LSTM, the number of layers of the Bi-LSTM, and the scaling factor. Therefore, it is essential to investigate the adaptability and robustness of the model for different tasks by adjusting the hyperparameters. The hyperparameter configuration is shown in Table 2.
Take FD001–FD002 as an example to discuss the influence of the hyperparameters on prediction performance, and the results are shown in Figure 4. As shown in Figure 4, the cross-domain prediction performance is best when the batch size is 256, the feature extractor learning rate is 0.005, the dual regressor learning rate is 0.01, the number of Bi-LSTM layers is five, the dimension of the Bi-LSTM hidden layer is 32, and λ is 0.3. The hyperparameter settings for all the cross-domain combinatorial research tasks are shown in Table 3.

4.3. Evaluation Indicators

In order to measure the predictive performance of the proposed model, the scoring function (SF) and root-mean-square error (RMSE) are used for quantitative assessment.
The SF is an asymmetric function often used to evaluate the performance of the remaining life prediction. For the remaining life prediction of an aero engine, the ahead prediction is better than the lagged prediction, so the lagged prediction is more severely penalized than the ahead prediction for the same error. The SF is defined as
S F = { i = 1 N ( e y ^ i y i 13 1 ) y ^ i y i < 0 i = 1 N ( e y ^ i y i 10 1 ) y ^ i y i 0
Compared to the SF, the RMSE equally penalizes the over-advanced and lagged prediction errors. The RMSE is defined as
R M S E = 1 N i = 1 N ( y ^ i y i ) 2
where y ^ i and y i are the predicted and actual values of the i-th sample of the testing set, respectively. N is the number of samples.

4.4. Experimental Results Analysis and Discussions

In order to validate the performance of the BDnet on the cross-domain prediction tasks, ablation experiments are first performed. The model that only utilizes the feature extractor and dual regressor directly trained with the source domain data on the target domain data is defined as model 1, which can be considered as the benchmark model for the RUL prediction problem. The model that uses the MCD is defined as model 2, and the model that uses both the MCD and the LMMD is defined as model 3. The three models’ predictions, as mentioned above, are compared with the real RUL. The models are trained using the source domain data and evaluated on the target domain. Since the C-MAPSS dataset has four subsets, 12 cross-domain prediction tasks are generated, and the results are shown in Figure 5.
First, model 3 gives the highest accuracy on all 12 cross-domain prediction tasks. Second, model 2 and model 3 are more stable and less fluctuating than model 1.
Taking FD001 as the source domain data, the cross-domain prediction results for the target domains of FD002, FD003, and FD004 are shown in Figure 5a–c, respectively. Since FD001 has a significant distributional discrepancy compared to FD002 and FD004, an apparent two-stage optimization structure can be seen, i.e., the distribution of the source and target domains is firstly brought closer by the GDA to learn the degradation trend of the target domain (red dashed line). Then, the prediction results are fine-tuned by the SDA to better fit with the real RUL (blue solid line). Although the data distributions of FD001 and FD003 are more similar, model 2 in Figure 5b gives the most significant prediction error compared to model 2 in Figure 5a,c. This is mainly because the maximum regressor difference is optimized by finding the difference features between the source and target domains. If the difference is slight, then multiple pieces of training may lead to overfitting. Therefore, we increase the degree of subdomain adaptivity by increasing the value of λ to achieve higher prediction accuracy.
Figure 5d–f demonstrate the cross-domain prediction results with the source domain as FD002 and the target domains as FD001, FD003, and FD004. Similar to the results with the source domain as FD001, the prediction accuracy is higher when the target domains are FD001 and FD003, and FD001 does not need subdomain adaptation because FD001 has only one failure mode compared to FD003.
Figure 5g–i demonstrate the cross-domain prediction results with FD003 as the source domain, and the prediction results of model 2 for all three tasks are significantly degraded, which is mainly due to the multi-fault modes. In addition, the degradation trend of the target domains, FD002 and FD004, is consistent with the real RUL. In contrast, the degradation trend of the target domain, FD001, shows a significant deviation from the real RUL. Thus, the λ of this group of experiments is increased in comparison with that of the previous two groups of experiments.
Like the last set of experiments, the source domain, FD004, also has two fault modes. However, the prediction results are better because FD004 has more training data and adequate feature extraction.
As shown in Table 4, model 1 has the worst prediction performance, and its prediction results heavily rely on the similarity between the source and target domain data distributions. For model 2, except for the four results of the RMSE and the two results of the SF (underlined) that are higher than their counterparts in model 1, the other results are better than their counterparts in model 1, and the result of the RMSE for FD002–FD001 in model 2 is the optimal result for the three models (bold). For model 3, all results are optimal for the three models, except for a result of the RMSE that is slightly higher than the corresponding result in model 2. Compared to model 1, the prediction model without domain adaptation, model 3 has an average reduction of 50.62% and 76.99% for the results of the RMSE and SF. Compared to model 2, with only global domain adaptation, model 3 has an average reduction of 35.16% and 47.66% for the results of the RMSE and SF. Therefore, in most cases, model 3 predicts better than model 1 and model 2, proving the effectiveness of the BDnet-based prediction model.
Figure 6 depicts the distribution of the feature vectors extracted by the three models to further validate the migration learning effect based on the BDnet prediction model (t-SNE is used in this work). For model 1, the representations of the features extracted from the source and target domains are more dispersed in two-dimensional space, and the degradation trend is not apparent enough, which further illustrates the discrepancy in the data distribution under different operating conditions and fault modes. With global domain adaptation, the degradation trend shown in Figure 6b is pronounced, and there is only a slight deviation in the distribution of features extracted from the two domains. Fine-tuned by subdomain adaptation, the health status of the two domains in model 3 are clustered in the same region, and the degradation processes are well-aligned. Thus, the visualization results intuitively validate the effectiveness of the BDnet-based prediction model.
To assess the quality of the model in transferring the degradation patterns from a source to a target domain, the BDnet-based predictive model is compared with four mainstream UDA methods targeting the C-MAPSS dataset. TCA-DNN and CORAL-DNN are UDA methods that combine traditional machine learning algorithms with deep neural architectures. TCA [28] uses MMD to learn cross-domain migration components in the regenerative kernel RKHS to construct a feature space that minimizes the domain differences. CORAL [29] aligns the second-order statistical features of the distributions of the source and the target domains using linear transformations. LSTM-DNN [19] and CADA [20] are typical adversarial UDA methods. LSTM-DNN consists of a feature extractor, a RUL predictor, and a domain discriminator, whereas CADA adds a contrast loss estimation module to minimize adversarial loss.
Twelve engines were randomly selected for comparison experiments on 12 cross-domain prediction tasks, and Figure 7 demonstrates the combined comparison results of the 12 engines for the five methods. The proposed method achieves good prediction performance compared to the four mainstream methods. TCA-DNN and CORAL-DNN are basically unable to learn the trend information of the degradation process. Although LSTM-DNN can learn the trend information of the degradation process, the prediction results significantly differ from the actual value due to the lack of local subdomain adaptation. In contrast, CADA obtained better prediction results by the contrastive loss module for fine-tuning, but, compared with the BDnet, CADA’s prediction results have greater volatility.
The comparative experimental results of the RMSE are shown in Table 5. The proposed method is better than TCA-DNN, CORAL-DNN, and LSTM-DNN on all 12 cross-domain prediction tasks, with only four tasks being slightly lower than CADA, with an average reduction of 77.24%, 61.72%, 38.97%, and 3.35% in the RMSE values compared to the four comparison methods.
The comparative experimental results of the SF are shown in Table 6. The proposed method is better than TCA-DNN [19], CORAL-DNN [19], and LSTM-DNN [19] on all 12 cross-domain prediction tasks, with six tasks being slightly lower than CADA [20], with an average reduction of 42.12% in the RMSE values compared to CADA.
In addition, the BDnet significantly improves the prediction performance of the larger domain offset prediction task while maintaining the performance of the smaller domain offset prediction task, proving the superiority of the BDnet without introducing other auxiliary models.

5. Conclusions

In this paper, an unsupervised domain adaptive regression method based on the BDnet was proposed to address the realistic problem of unlabeled samples in the aero engine RUL prediction task. Unlike previous adversarial DA methods that introduce discriminators for domain feature alignment, the BDnet achieves the GDA by maximizing the regressor difference to gradually detect and adjust the target domain samples with significant domain shifts, without introducing an additional model. In addition, to extract the fine-grained features of the aero engine at different degradation stages, the LMMD is utilized to achieve the SDA based on the GDA. Finally, experiments are conducted on 12 cross-domain prediction tasks to verify the effectiveness of the BDnet-based RUL prediction model in solving the unsupervised domain adaptive regression problem.

Author Contributions

Methodology, N.L., X.Z. and J.G.; software, N.L.; validation, N.L.; formal analysis, N.L. and S.C.; investigation, N.L.; resources, X.Z.; data curation, N.L.; writing—original draft preparation, N.L.; writing—review and editing, X.Z. and J.G.; visualization, N.L.; supervision, N.L. and S.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Written informed consent was obtained from the patient(s) to publish this paper.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhou, J. Research on Data-Driven Prediction Methods for Remaining Useful Life of Aero-Engine; Nanjing University of Aeronautics and Astronautics Press: Nanjing, China, 2017. [Google Scholar]
  2. Hu, Y.; Miao, X.W.; Si, Y.; Pan, E.S.; Zio, E. Prognostics and health management: A review from the perspectives of design, development and decision. Reliab. Eng. Syst. Saf. 2022, 217, 108063. [Google Scholar] [CrossRef]
  3. Siahpour, S.; Li, X.; Lee, J. A Novel Transfer Learning Approach in Remaining Useful Life Prediction for Incomplete Dataset. IEEE Trans. Instrum. Meas. 2022, 71, 3509411. [Google Scholar] [CrossRef]
  4. Xia, J.; Feng, Y.W.; Teng, D.; Chen, J.Y.; Song, Z.C. Distance self-attention network method for remaining useful life estimation of aeroengine with parallel computing. Reliab. Eng. Syst. Saf. 2022, 225, 108636. [Google Scholar] [CrossRef]
  5. Zhang, W.; Li, X.; Ma, H.; Luo, Z.; Li, X. Transfer learning using deep representation regularization in remaining useful life prediction across operating conditions. Reliab. Eng. Syst. Saf. 2021, 211, 107556. [Google Scholar] [CrossRef]
  6. Zhao, H.L.; Chen, T.M.; Zheng, N. Engine life prediction based on multi-stage similarity of comprehensive index. Syst. Eng. Electron. 2021, 43, 1430–1436. [Google Scholar]
  7. Zhang, J.S.; Li, X.; Tian, J.L.; Jiang, Y.C.; Luo, H.; Yin, S. A variational local weighted deep sub-domain adaptation network for remaining useful life prediction facing cross-domain condition. Reliab. Eng. Syst. Saf. 2023, 231, 108986. [Google Scholar] [CrossRef]
  8. Ma, Z.; Guo, J.S.; Gu, T.Y.; Mao, S. A remaining useful life prediction for aero-engine based on improved convolution neural networks. J. Air Force Eng. Univ. 2020, 21, 19–25. [Google Scholar]
  9. Pan, Y.J.; Sun, Y.; Li, Z.X.; Gardoni, P. Machine learning approaches to estimate suspension parameters for performance degradation assessment using accurate dynamic simulations. Reliab. Eng. Syst. Saf. 2023, 230, 108950. [Google Scholar] [CrossRef]
  10. Chen, J.X.; Li, D.P.; Huang, R.Y.; Chen, Z.Y.; Li, W.H. Aero-engine remaining useful life prediction method with self-adaptive multimodal data fusion and cluster-ensemble transfer regression. Reliab. Eng. Syst. Saf. 2023, 234, 109151. [Google Scholar] [CrossRef]
  11. Zhu, Y.C.; Zhuang, F.Z.; Wang, J.D.; Ke, G.L.; Chen, J.W.; Bian, J.; Xiong, H.; He, Q. Deep Subdomain Adaptation Network for Image Classification. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 1713–1722. [Google Scholar] [CrossRef]
  12. Pan, T.Y.; Chen, J.L.; Ye, Z.S.; Li, A.M. A multi-head attention network with adaptive meta-transfer learning for RUL prediction of rocket engines. Reliab. Eng. Syst. Saf. 2022, 225, 108610. [Google Scholar] [CrossRef]
  13. Zhu, J.; Chen, N.; Shen, C.Q. A new data-driven transferable remaining useful life prediction approach for bearing under different working conditions. Mech. Syst. Signal Proc. 2020, 139, 106602. [Google Scholar] [CrossRef]
  14. Cheng, H.; Kong, X.G.; Chen, G.G.; Wang, Q.B.; Wang, R.B. Transferable convolutional neural network based remaining useful life prediction of bearing under multiple failure behaviors. Measurement 2021, 168, 108286. [Google Scholar] [CrossRef]
  15. Jiao, J.Y.; Zhao, M.; Lin, J.; Ding, C.C. Classifier Inconsistency-Based Domain Adaptation Network for Partial Transfer Intelligent Diagnosis. IEEE Trans. Ind. Inform. 2020, 16, 5965–5974. [Google Scholar] [CrossRef]
  16. Yang, B.; Lei, Y.G.; Jia, F.; Xing, S.B. An intelligent fault diagnosis approach based on transfer learning from laboratory bearings to locomotive bearings. Mech. Syst. Signal Proc. 2019, 122, 692–706. [Google Scholar] [CrossRef]
  17. Sun, C.; Ma, M.; Zhao, Z.B.; Tian, S.H.; Yan, R.Q.; Chen, X.F. Deep Transfer Learning Based on Sparse Autoencoder for Remaining Useful Life Prediction of Tool in Manufacturing. IEEE Trans. Ind. Inform. 2019, 15, 2416–2425. [Google Scholar] [CrossRef]
  18. Mao, W.T.; He, J.L.; Zuo, M.J. Predicting Remaining Useful Life of Rolling Bearings Based on Deep Feature Representation and Transfer Learning. IEEE Trans. Instrum. Meas. 2020, 69, 1594–1608. [Google Scholar] [CrossRef]
  19. Da Costa, P.R.D.; Akcay, A.; Zhang, Y.Q.; Kaymak, U. Remaining useful lifetime prediction via deep domain adaptation. Reliab. Eng. Syst. Saf. 2020, 195, 106682. [Google Scholar] [CrossRef]
  20. Ragab, M.; Chen, Z.H.; Wu, M.; Foo, C.S.; Kwoh, C.K.; Yan, R.Q.; Li, X.L. Contrastive Adversarial Domain Adaptation for Machine Remaining Useful Life Prediction. IEEE Trans. Ind. Inform. 2021, 17, 5239–5249. [Google Scholar] [CrossRef]
  21. Saito, K.; Watanabe, K.; Ushiku, Y.; Harada, T. Maximum Classifier Discrepancy for Unsupervised Domain Adaptation. In Proceedings of the 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 3723–3732. [Google Scholar]
  22. Gretton, A.; Borgwardt, K.M.; Rasch, M.J.; Scholkopf, B.; Smola, A. A Kernel Two-Sample Test. J. Mach. Learn. Res. 2012, 13, 723–773. [Google Scholar]
  23. Ramasso, E. Investigating Computational Geometry for Failure Prognostics in Presence of Imprecise Health Indicator: Results and Comparisons on C-MAPSS Datasets. In Proceedings of the 6th European Conference of the PHM Society, Virtual, 28 June–2 July 2021. [Google Scholar]
  24. Huang, Z.H.; Xu, W.; Yu, K. Bidirectional LSTM-CRF models for sequence tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar]
  25. Kononenko, I. Bayesian neural networks. Biol. Cybern. 1989, 61, 361–370. [Google Scholar] [CrossRef]
  26. Saxena, A.; Goebel, K.; Simon, D.; Eklund, N. Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation. In Proceedings of the International Conference on Prognostics and Health Management, Denver, CO, USA, 6–9 October 2008; IEEE: Piscataway, NJ, USA, 2008. [Google Scholar]
  27. Cheng, Y.W.; Wu, J.; Zhu, H.P.; Or, S.W.; Shao, X.Y. Remaining Useful Life Prognosis Based on Ensemble Long Short-Term Memory Neural Network. IEEE Trans. Instrum. Meas. 2021, 70, 3503912. [Google Scholar] [CrossRef]
  28. Pan, S.J.; Tsang, I.W.; Kwok, J.T.; Yang, Q.A. Domain Adaptation via Transfer Component Analysis. IEEE Trans. Neural Netw. 2011, 22, 199–210. [Google Scholar] [CrossRef]
  29. Sun, B.C.; Feng, J.S.; Saenko, K. Return of Frustratingly Easy Domain Adaptation. In Proceedings of the 30th Association-for-the-Advancement-of-Artificial-Intelligence (AAAI) Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Association for the Advancement of Artificial Intelligence: Palo Alto, CA, USA, 2016; pp. 2058–2065. [Google Scholar]
Figure 1. The schematic of global alignment and sub-domain alignment.
Figure 1. The schematic of global alignment and sub-domain alignment.
Sensors 23 09494 g001
Figure 2. The difference in distribution for each C-MAPPS dataset.
Figure 2. The difference in distribution for each C-MAPPS dataset.
Sensors 23 09494 g002
Figure 3. Training steps of bi-discrepancy network.
Figure 3. Training steps of bi-discrepancy network.
Sensors 23 09494 g003
Figure 4. Experimental results with different hyperparameters for FD001–FD002 cross-domain prediction tasks.
Figure 4. Experimental results with different hyperparameters for FD001–FD002 cross-domain prediction tasks.
Sensors 23 09494 g004
Figure 5. RUL predictions of three models for 12 randomly selecting engines coming from the target domain.
Figure 5. RUL predictions of three models for 12 randomly selecting engines coming from the target domain.
Sensors 23 09494 g005
Figure 6. Visualization results of the learned features by three methods of No. 4 engine from source domain and No. 5 engine from target domain for FD001–FD002 cross-domain prediction tasks.
Figure 6. Visualization results of the learned features by three methods of No. 4 engine from source domain and No. 5 engine from target domain for FD001–FD002 cross-domain prediction tasks.
Sensors 23 09494 g006
Figure 7. RUL predictions of five methods for 12 randomly selected engines coming from the target domain.
Figure 7. RUL predictions of five methods for 12 randomly selected engines coming from the target domain.
Sensors 23 09494 g007
Table 1. The description of aircraft turbofan engine dataset.
Table 1. The description of aircraft turbofan engine dataset.
DatasetTraining EnginesTesting EnginesFault ModesOperational Conditions
FD00110010011
FD00226025916
FD00310010021
FD00424924826
Table 2. Hyperparameter configuration of BDnet.
Table 2. Hyperparameter configuration of BDnet.
HyperparameterConfiguration
Batch size (bs)32, 64, 128, 256, 512
Learning rate of the feature extractor (lr-F)0.001, 0.005, 0.01, 0.05, 0.1
Learning rate of the dual regressor (lr-R)0.001, 0.005, 0.01, 0.05, 0.1
Dimension of the hidden layer of the Bi-LSTM (hd)16, 32, 64, 128, 256
Number of layers of the Bi-LSTM (nl)3, 4, 5, 6, 7
Scaling factor ( λ )0.1, 0.3, 0.5, 0.7, 0.9
Table 3. Hyperparameters configuration under various cross-domain prediction tasks.
Table 3. Hyperparameters configuration under various cross-domain prediction tasks.
TaskBatch SizeLearning Rate of Feature ExtractorLearning Rate of Dual Regressor Hidden Dim of Bi-LSTM Number of Bi-LSTM Layers λ
FD001–FD0022560.0050.015320.3
FD001–FD0032560.0010.011320.7
FD001–FD0042560.010.015640.3
FD002–FD0012560.010.015640.1
FD002–FD0032560.010.015640.1
FD002–FD0042560.0050.013320.5
FD003–FD0012560.010.011320.9
FD003–FD0022560.010.0151280.5
FD003–FD0042560.010.0151280.3
FD004–FD0012560.010.015640.5
FD004–FD0022560.0050.013320.7
FD004–FD0032560.010.015640.3
Table 4. Comparison of RMSE and SF for three models under various cross-domain prediction tasks.
Table 4. Comparison of RMSE and SF for three models under various cross-domain prediction tasks.
TaskModel 1Model 2Model 3
RMSESFRMSESFRMSESF
FD001–FD00241.1135,853.0628.174438.9920.833563.29
FD001–FD00331.985572.9973.3210,117.3721.282189.71
FD001–FD00438.3810,130.3435.147713.3228.574003.32
FD002–FD00170.736636.2816.11582.7816.14474.32
FD002–FD00379.5112,905.8039.904793.3222.652525.53
FD002–FD00443.3981,918.8839.307003.9931.888131.71
FD003–FD00132.613875.7254.983698.7818.862725.45
FD003–FD00245.7148,222.1249.4816,427.3921.675923.09
FD003–FD00442.219802.8936.738448.2219.244102.11
FD004–FD00198.7572,967.4838.537330.9219.483535.12
FD004–FD00232.628970.8234.8410,423.4825.14070.49
FD004–FD003101.7918,233.9720.61609.3813.74549.18
Table 5. Comparison of RMSE for the proposed method against state-of-the-art approaches.
Table 5. Comparison of RMSE for the proposed method against state-of-the-art approaches.
TaskTCA-DNN [19]CORAL-DNN [19]LSTM-DANN [19]CADA [20]BDnet
FD001–FD00290.077.548.619.5220.83
FD001–FD003116.169.645.939.5821.28
FD001–FD004113.884.643.831.2328.57
FD002–FD00185.680.928.113.8816.14
FD002–FD003111.579.837.533.5322.65
FD002–FD00494.443.631.933.7131.88
FD003–FD00190.526.531.719.5418.86
FD003–FD00280.875.644.619.3321.67
FD003–FD004102.677.247.920.6119.24
FD004–FD00185.694.031.620.1019.48
FD004–FD00280.830.924.918.5025.10
FD004–FD003102.968.627.814.4913.74
Table 6. Comparison of SF for the proposed method against state-of-the-art approaches.
Table 6. Comparison of SF for the proposed method against state-of-the-art approaches.
TaskTCA-DNNCORAL-DNNLSTM-DANNCADABDnet
FD001–FD00255,76344,963279821223563.29
FD001–FD00316,99114,16599,64684152189.71
FD001–FD00452,05338,69725,81211,5774003.32
FD002–FD00135903393638351474.32
FD002–FD00323,07116,511897652132525.53
FD002–FD00462,85229,02916,24815,1068131.71
FD003–FD00145811341378014512725.45
FD003–FD00273,02668,32611,47252575923.09
FD003–FD00411,4078583934532194102.11
FD004–FD001154,842170,03743,26418403535.12
FD004–FD00238,09514,427772344604070.49
FD004–FD003691946131167682549.18
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, N.; Zhang, X.; Guo, J.; Chen, S. Aero-Engine Remaining Useful Life Prediction Based on Bi-Discrepancy Network. Sensors 2023, 23, 9494. https://doi.org/10.3390/s23239494

AMA Style

Liu N, Zhang X, Guo J, Chen S. Aero-Engine Remaining Useful Life Prediction Based on Bi-Discrepancy Network. Sensors. 2023; 23(23):9494. https://doi.org/10.3390/s23239494

Chicago/Turabian Style

Liu, Nachuan, Xiaofeng Zhang, Jiansheng Guo, and Songyi Chen. 2023. "Aero-Engine Remaining Useful Life Prediction Based on Bi-Discrepancy Network" Sensors 23, no. 23: 9494. https://doi.org/10.3390/s23239494

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop